DOCOHEIT RBSUME 



BD 155 207 

AUTHOR 
TXTLE 

INSTITOTION 
SPONS AGENCY 

s BUREAU NO 
POB DAT I 



GRANT 
NOTE - 



EDBS PRICE 
DESCRIPTORS 



IDENTIFIERS 



95 TH 007 173 

^ V 

Tatsuoka, Kikumi K. ; Tatsuoka, Maurice M. . 
Time-Sccre Analysis-in Criterion-Referenced Tests, 
Pinal Report, * 
Illinois Uhiv. , . Orbana. 

National Inst', of -Education (DHEW) , Washington, 

DvC* . 

BR-6-0554 

Feb 78 ■ - I 



NIE-G-76-Q087 

177p.; Not available in hard copy due to marginal 
legibility of original tatles 



MF-S0.83 Plus Postage 
♦Computer Assisted In 
Tests; Data Analysis; 
of. Fit; *Iteip Analysi 
♦Mathematical Models; 
Education; *Reaction 
Analysis; Test Interp 
♦Computer Assisted Te 
Coefficient; Test The 



• HC Not Available fr^m EDRS* 
structicn; Criterion Referenced 

Feasibility Studies; Gcodness 
s; Mastery learning; 

Matrices; Post Secondary 
Time;. Scores; Statistical 
retation; *Timed Tests 
sting; Estimation; Gamma 
cry;. *WeiJt>ull Distributions 



ABSTRACT 

The family of Weibull distributions vfas investigated 
as ,a model for the distributions of response times for items in 
computer-based criterion-referenced tests* The fit of these 
distributions were, with a few exceptions, gcod tc^ excellent' 
according to the Kolmogcrov-Smirnov test* For a few relatively simple 
items, the two-parameter gamm* distribution provided better fits* The 
three parameters of the 'fieibull distribution were as follows: the 
location parameter represents the theoretical minimum time required; 
the scale parameter is >related,to the mean; and the shape parameter 
(c) is related* to two kinds of difficulty indices* It also appeared 
that the shape parameter was related to the degree of familiarity of 
the item and the degree of engagement or involvement ojt the test 
takers* A function related to c was the conditional response rate, 
which is called the hazard rate in system-reliability literature* *The 
c' parameter was found tc be sensitive tovthe conceptual difficulty of 
items that were equal according to traditional difficulty indices* An' 
index was developed for the efficiency of lessons and this, too, 'Has 
found* to be related to the Heibull parameter. Finally, c was judged 
to be related to ,the' optimal cutoff point for criterion-referenced 
tests. (Author/CTM) . * \ 



s 



************ ***** *** ****** **************** ***************************** 

* Reproductions supplied by EDES are the best that caji be made * 

* from the original document. t * 
**************************** *i 



ERLC 



o 

LTV 



LU 



FINAL REPORT 



Project 'No. 6-0554^ 
Grant No. NIE-G-76-0087 



U s DEPARTMENT OF HEALTH. 
EDUCATION A WELFARE 
NATIONAL INSTITUTE OF 
EDUCATION 

TH,S DOCUMENT HAS BEEN RE PRO* 
' ,«« EXACTLY AS RECEIVED FROM 
?HE PERSON OR ORGANIZATION OR.G.N- 
I^NG It'pO.NTS OF V'EW OR OPINES 
STAT E 0 DO NOT NECESSARILY REPRE- 
SENT OFFICIAL NATIONAL INST ITUTE OF 
60UCATION POSITION OR POLICY # 



TIME-SCORE ANALYSIS IN CRITERION-REFERENCED TESTS 



S 

Kikumi K . Tatsuoka 
Computer- j^ased Education Research Laboratory 

and 

# * - 

Waurice M. Tatsuoica 

• " Department of Edudational Psychology 
College of Education 



BEST COPY AVAILABLE' 



University of. Illinois at Urbana-Champaign 
■ Urbana, Illinois 61801 



CO 



o 



ERIC 



February, 1978' 



U.S. Department of Heal th, .Education, and Welfare 
National Institute of Education • 

2 



FINAL REPORT 



Project No. 6-0554 . . 
Grant No. NIEtG-76-0087 



TIME-SCORE ANALYSIS 141 CRITERION-REFERENCED TESTS 



Kikumi K. Tatsuqka 
CQmputer-Based Education Research j^aboratory 

-and • * 
r Maurice Tatsu$k& 

■Department of Educational psychology 



I ' College of Education 



University of Illinois at Urbarta- Champaign 
* Urbana, Illinois 61801 



✓ 



; , February, 1978 



U.S. department . of ■ Heaft+i/ Education, and Welfare 
National Instituteof Education . 



ABSTRACT 



This -study investigated the feasibility of using the family of Weibull 
distrubutions - a family which is^widely used in system-reliability analysis 
as a model for the distributions of time scores (response times) of items 
in .criterion-^ref erenced tests, lesson segments and entire lessons that were 
implemented' on the PLATO system. The items were those of a series of 
^patrix algebra tests' developed for the dual purpose of using in this study 
and for testing students in tjiree statistics courses at-UIUC both before 
and afcter they studied our matrix algebra course. The latter provided the 
lesson segments (including exercises) while the entire lessons came from 
the Chanute AFB CBE ptoject and deals with special and general vehicle 
maintenance training. 

The .fits of the Weibull distributions to these various observed 
distributions were, on the wfiole, very good to excellent as gauged by the 
Kolmogorov- Smirnov goodness-of-f it test. However, for some items (most 
o*f which possessed certain exceptional properties in common) the two- 
parameter gamma distribution offered better fits. The same held true with 
even greater force for the exercises occurring in the matrix algebra lessons, 
f Tentative explanations of when and why the gamma was better than the Weibull 
were advanced, but discovery of definitive reasons must await future research 

We would be the first- to concede that we have barely scraped the surface 
in studying the utility of response time (time scores) along with performance 
scores for analyzing and evaluating data from criterion-referenced tests, 
both for the purpose of assessing 6he quality of the tests themselves and 
for improved testing of the examinees 1 abilities. 

Nevertheless, we believe that we have at least demonstrated the 
^flrifltll 1 A l fc y °f this approach and hope to have shown that further research 
along these lines is warranted. In particular^ the Weibull distribution 
in its two-parameter . form (which we used in this study*), three-parameter 
form, or two-component composite form - long used by system analysta-but 
apparently not widely known among educational and psychological res/earchers -» 
seems to bear further investigation for this purpose. 



ACKNOWLEDGEMENTS 

We wish to acknowledge the services rendered by the following 
persons ; 

B6b Baillie, expert -programmer and numerical analyst, who has 
several publications in the Mathematics Of Computation and other 
journals, wrote most of the main programs. 

Tamar Weaver, a highly experienced former programmer at the 
Ministry of Transportation of Israel, who wrote several statistical 
analysis' routines and transformation programs. 

Kay Tatsuoka, junior in mathematics and computer science, at the 
Massachusetts Institute Of Technology, who wrote several statistical 
analysis routines and utility programs. 

Patrick-Maritz, a research assistant, who made a start 
for our being able to utilize the LOGIST program. 

Jert;y Dyer, Mark Bradley, and John Matheny, who served as junior 
programmers. 

Julie Garrard, for editorial work and clerical help. 

Curtis Tatsuoka, 15 years of age, for refining screen displays 
and qarrying out other small programming tasks. 

Bob Linn ^nd Charles Lewis for having their statistics-course 
students use our matrix algebra les'sons and tests. Also, Larry 
'Francis, former Director of -the Military* Training Group at CERL for 
giving us access to data from the Chanute AFB CBE Project.* 



TABI^E OF CONTENTS 

INTRODUCTION * , 1 

THE WEIBULL DISTRIBUTION: RATIONALE AND DERIVATION 3 

2.1 Derivation Based on the Conditional Response Rate 4 

2.2 Comparison of Several Related Distributions » 10 

PARAMETER ESTIMATION 12 

3.1 The. Weibull-Distributiort Parameters ' 12 

3.2 The Gamma-Distribution Parameters ....15 

DESCRIPTION OF DATA 15 

4.1 Matrix Algebra Lessons and Tests-.,...' * 1£ 

4.2 Chanute AFB CBE Project Lessons and Tests 19 

4.3 Teaching Strategies and Lesson Styles ....U 20 

ANALYSIS OF PREREVISION DATA .' 23 

5.1 The PLATO IV System and the Programs *. .23 

5.2 Weibull Fitting of ^tem Response-Time Data 24 

5.3 Characteristics of 'the Pretest Items 35 

5.4 Interpretation of Weibull Parameters 44 

5.5 Correlations among Weibull Parameters and^ Item Statistics...^! 

ANALYSIS OF POST-REVISION DATA § .... .55 

6.1 Description of # Posttest (with Some Speculations) '.55 

6.2 Results of Analyses .• 57 

WEIBULL AND GAMMA FITS COMPARED....... ^ 61 

7.1 Multiplication Pretest and Posttest 62 

7. 2. Exercises in Matrix Algebra Test that Require Only 
Mechanical Practice • .71 

7»3 Instructional Units or Areas in Matrix Aigebra' Lessons 75 

7.3. The Lessons of Special and General Vehicle Training Program 

at Chanute Air Forc£ Base . . . 7 6 

THE CORRELATES OF PROBABILITIES OF v ^ISGLASSIFICATION BY 

CRITERION -REFERENCED -TESJS . .'.84 

8.1 Beta Binomial Model ' - '••84 

8.2 Evaluation of the Optimal Cutoff Scores ' 89 

• 8.3 t Other Measures Obtained from the Evaluation Study of the! ~ 

Chanute Air Force Base Compute r-Based Ecjucatian .project 99 

8.4,. The results of . States tical Analyse^ ove^r 27 Chanute ' Lussons .. 10 



, ^ — - .. ' 

9. SUMMARY AND CONCLUSION * I... 116 

REFERENCES * 121 

APPENDICES ; 124 

A Sample Pages of Matrix Algebra besspns 124 

B The Items.in Matrix Algebra. Test •: 127' 

C Description of Contents in the Lessons of Chanute .....1*37 

D Description of PLATO Programs and their /Programmers. •• 137 

E -Tables of p-values and*'the Weibull. Parameter * 138 

F Graphs of Conditional Response Rate' 164 




\ 



TIME-SCORE ANALYSIS IN CRITERION-REFERENCED TESTS 
* t 

1. INTRODUCTION 



It # is well known that one of the major, problems encountered 
in .psychometric and statistical analyses of criterion-referenced (or 
'domaift-ref erenc^d) tests ^ terns from the fact that ? because they are J4 

v designed primarily for mastery testing, their scores tend to be uni- 
formly quite high. * The consequent lack of variability of scores leads 
to embarrassingly low reliability and validity coefficients when these 
are define^ in the traditional way in terms of product-moment correla- 
tion coefficients. A number of authors (e.g., Harris, 1972; Huynh, 

/1976; Livingston, 1972) have proposed various' approaches td side -step- 
ping this problem of limited score variability by "offering alternative 
measures of reliability and validity. ft 

» • 

One approach that does not appear to have'been exploited to , 
date, however, is the seemingly obvious one of considering time scores— 
i.e., the time it takes examinees to "respond to items or entire tests 
(assumed unspeeded) — in addition to performance scores. That there is 
no dearth of variability in time scores is evident from casual observa- 
tion. The main reason time scores have not beeji utilized despite this 
*fact is probably that their accurate recording can take place* only in 
the context of computer aided instruction and testing, which are fairly 
recent developments.- Another possible reason is that response times 
have widely been regarded as erratic phenomena not exhibiting any law- 
like behavior and hence not indicative of the extent' of knowledge or 
mastery, pf a subject matter. (We are here obviously excluding the use 
of time measures such as response latencies and time taken to learn 
lists of nonsense syllables, paired associates, etc. that have long and 
widely beei^used uhder the tightly controlled c6nditions of psychologi- 
cal experiments. Also, we are aware that one of Rasch's models [1960] 
involves a time measure, viz., the time required by a pupil to read a 
'passage of a given length. But again, the situation here is a relatively 
controlled One. Reading a particular passage is a much more circum- 
scribed activity than, say, taking an algebra*test in :which .various 
abilities are brought to pl'ay.) 

One of the present authors has -been working in the field, of 
computer based instruction (specifically the PLATO system at the Univers- 
ity of Illinois) for a number of years, and sh% has hence been informally 
, 'exploring 'the utilization of time scores for a long time.* The research 
described in this report is an outgrowth of this sustained interest in 
time scores and represents a more systematic exploration of their utility. 
We w£sh to emphasize, however, that this study makes no attempt to en- 
hance the psychometric properties of criterion-referenced tests by of- 
fering alternative measures of reliability and validity based on time 
scores. v (That must be deferred to some future project.) The objective 



.ERIC 



8 



\ 

of jjke present study, to repeat, is simply to explore in depth how time 
scores behave and to reveal whatever regularities and potential useful- 
ness they may possess* 

& One way to check whether a variable is behaving in a systematic 
fashion is to examine its statistical distribution, and if it seems to 
be following some identifiable theoretical distribution, to see if some 
rationale can be adduced to explain why it might be expected to follow 
that particular distribution. Of course there are' any number' of theo- 
retical 'distributions a stochastic variable may seein to be following, so 
it would be like looking for 'a needle in a haystack if thei^e weren't 
some guides as^ to what sort of distribution might fill the bill. Since 
Ra&ch's (I960), work parenthetically alluded to above had led to a two- 
parameter gamma distribution for the time taken to read a passage of N 
words, this distribution was a possible candidate. However, Bree (1975) 
had analyzed sbml empirical data on problem-solving time (albeit of 
quite limited scope) which showed that a two-parameter negative exponen- 
tial distribution offered a better fit than the two-parameter gamma dis- 
tribution,, thus decreasing^ the attractiveness of the latter. 

- & • 

* * m 

We were therefore thinking of carrying out a larger-scale repli- 
cation of Bree's study comparing the relative goodness-of-f it of the 
gamma and negative exponential distributions, utilizing a large and in- # 
creasing dat.a base accessible to us (and in part developed by us) on the 
JPLATO system^ when a third family^ of distributions shown to be useful in 
modeling certain time-score distributions came to our attention^. This 
was the Weibull (1951) distribution which, we learned, had been (and 
continues to be) 'extensively used in the context of system-reliability 
theory: the study of the probability of failure, within a given time 
span, of a mechanical or electronic system as a function of the, probabil- 
ities of failure of individual components ^f the system. We learned of 
"this distribution through the works of' Sato (1973) and Takeya, Sato and 
Sunouchi '(1975) who had. pioneered 1 its application tQ the modeling of the 
cumulative response curve, i.e., the plot of the percentage of students 
completing an item within a given length of time, against the latter as 
abscissa,. 

The justification suggested (although not explicitly stated) 
by Sato .and his coworkers for diverting a distribution found to t*e des- 
criptive of fatigue or failure time to so remote a field of application 
as response time for test items is as follows. The test item (or total 
test, or instructional unit, depending on the level of analysis) is 
identified with the system whose reliability is being assessed. The 



1 It was subsequently brought to ou/f attention that? Bargman (1966) 
had also utilized the Weibull distribution in a study of growth func- 



tions* 



Student's "attacks 11 on the item correspond fo tfoe shpcks or wear and * 
teap- to which the system is subjected, 2nd the eventual solution of the 
item is the taftlure of the system. Farfetched as such identifications 
may seem, they are not unreasonable. It is plausible to imagine the 
student to be intent on "cracking the system" by answering .the item 
correctly. The time he takes in doing so — the "response time — corresponds 
to the "survival time 11 (or "fatigue life! 1 ) of the system. The only dif- 
ference is that, whereas in system-reliability analysis we want the sur- 
vival time to be as long as possible, in test-response data we want it 
to be 4 as short as possible — especially in criterion-referenced tests. 
Thus, the use of the WeibulJ distribution in time-score analysis has 
some intuitive appeal. 

Aioth^r reason that encourages at least examining the Weibuli 
distribution for the purpose ^t Hand is that the two-parameter negative* 
exponential distribution advocated by Bree can be regarded as a special 
'case of the Weibuli distribution — a three-parameter family — when one c5f - 
its parameters is equated to unity.* (See next section for mathematical 
demonstration.) When it is recalled that Bree's data base was quite 
limited — comprising solving-time data from three problems originally 
fitted to gamma distributions by Restle^nd Davis (1962) plus those for 
k fourth problem taken from another source — it is not inconceivable that 
these sets of data happened- to be well modeled by this special case of 
the Weibuli distribution.' If so," the psychological arguments invoke'd by 
Bree to provide a rationale for -the two-parameter negative exponential 
distribution may hold also for the Weibuli distribution. 

Thus the thrust of our cpntemplated study shifted from a gamma 
vs. negative-exponential comparison to a 'more general one of investigat- 
ing the usefulness of the W&4^>ull distribution. as a model for time-score 
data from CR tests in the context of CAI. What is reported iu ( the sequel 
therefore^ includes but is not confined to a comparison^ of the gamma and 
Weibuli distributions. It also includes attempts to relate the three 
parameters of* the latter distribution tb ^rious psychometrically meaning 
ful indices associated with CR tests and their constituent items, such as 
difficulty level, ability to differentiate between masters and nohmasters 
and so forth. v - * 



2. THE WEIBULL DISTRIBUTION: RATIONALE AND DERIVATION 

«* Although an intuitive rationale for the applicability of the 

Weibuli distribution for item (or te^t) response time w&s given in*the 
introduction by identifying the solution of an' item by a student with 
the failure of a system in system-reliability theory, this rationale does 
not lead to a derivation of the distribution (or density) function. .In 
other words, the rationale stated earlier is far from being a set of 
^xioms^ or postulates from which the mathematical form of the density func 
tion logically flows. * In the final analysis, as Weibuli Jiimself (1951) * 



10 



and subsequent expositors (e.g., Mann, Schafer and Singpurwalla, 1974) 
have said, the distribution was empirically discovered rather than 
•axiomatico-deductively derived in the first place. Nevertheless, if 
something even remotely resembling a postulate (or set of pQstulates) 
can be found that nfakes intuitive sense and at the same^time logically 
implies* the mathematical expression for the/distribution function, this 
would lend greatly to the credibility of • the distribution. Such a 
basis has been postulated (albeit as an ex post facto rationalization) 
by system- reliability researchers in terms of the 4 concept of hazard rate 
which is essentially the conditional probability that a system which has 
survived through time t Will fail during an infinitesimal time interval 
immediately thereafter. Translated to fit the context of* item response 
time, this may be dubbed th§ conditional response 'rate and as defined in 
the following subsection. 



2. 1 Derivation Based on the Conditional Response Rate 



Let us denote by f(t) the probability derisity that a person 
randomly selected from the population will respond to a given test item 
(or any other unit of a test) during the infinitesimal time interval 
[t, t + dt]. (The actual probability that the person will respond to 
the item in this time interval is f (t)dt.) Then the proportion of indi- 
viduals who will have responded to the item by time t is / 

^ rt * ^ ' 



F(t) = f(u)du, ' . 

0 

which is the (cumulative) distribution fqpction. It follows that the * 
proportion" of individuals who have not responded to the item by time t 
is 1 - F(t). Consequently, the conditional probability that a person 
will respond to the itein during the interval [t, t + dt] given that he 
or she has not responded to the item up to time t ±s K by the -definition 
of a conditional iprobability, -given by 



p (responds # in interval [t,t+dt]|has not responded by time t) 
m f (t)dt 
* l-P(t) 



(From the definition of conditional probability, qne might expect to 
find in the numerator the probability of, 4 the joint event "has not re- 
sponded by time t and responds in interval [t, t-Hdt] ." Ifowever, a little 
reflection shows that the simple event ''responds in interval [t, t+dt]" 
automatically implies "has not responded t>y time t." Ifcnce the former 
simple event is synonymous with the joint event cited,' and their 



11 



probabilities are identical.) The conditional response rate (CRR) is 
defined by th6 expression ^pove exclusive of the differential element 
dt, and we symbolize it byTk(t) > keeping the notation cotryjtott-iy usfed- for 
hazard ratlin system-reliability theory. Thus 



(2.1) 



h(t) 



f(t) 



1 - F(t) ' 



From the concept of hazard* r^te^in general*, the^ corresponding 
distribution and density functions may easily be derived by elementary \^_/~* 
calculus, as follows. "Tacking on" the differential element dt in 
both- sides of equation (2.1), replacing f(t)dt by the differential ele- 
ment dF(t) of F(t), % and further writfing u in place of t (in. anticipation 
of using t for the upper limit of a/clefinite integral), we obtain ' 



h(u)du = 



dF(u) 
1 - F(u) ' 



Integrating both sides from a lower limit u =,t to a general upper 
limit t, we get N 0 



rt * v u=t 

h(u)du = -Ml - F(u)] 
J t u=t 



= *n[l - F(t )] - Ml - F(t)] 
o 



= -Unlf - F(t)], 



if we let t be the lower limit of the range of t so that F(t«) ■ 0. 

o ^ ^ o « 

It then follows that * * m * 



1 - F(t) = exp[- h(u)du}," * 
* t 



or 



(2.2) F(t) = 1 - exp [- 



h(u)du] 



Taking derivatives of both sides, we get 

>t 
t 



(2.3) f(-t). - h£).exp [-| h(u)du]. 



12 



The last* two" equations express the distribution functipn and 4 
the density function, respectively x as functions of the CRR h(t) in 
general. Substituting particular expressions for h(t) in these equa- 
tions gives rise to particular distribution and density functions. The 
W£ibull distribution results essentially when it is assumed that h(t) 
is a monotoiVLcally increasing function of t, is independent of t, or is 
a monotonically decreasing function of time. (That is, we forbid K(t) 
from being a function that first increases with t, reaches a maximum, 
an v d then decreases with' t, or the k other way around. Of course, more 
complicated behaviors are also forbidden.) Actually, we need to be 
slightly more specific than merely requiring h(t) to b§ a monOtonic 
function of t; we must require it to* be a monotonic power function of t 
(like t m ) / W e fu rther write the expression in a more elaborate form in 
order to have a M neat ,f expression for the resulting probability derisity 
and distribution functions. Specifically, we postulate that 

(2.4) ' h(t) = — (t-t ) c_1 

0 

Although this expression looks highly contrived, th£ multiplier c/|i° 

* 5 j O 

may, at this point, be regarded simply as a proportionality constant, 

and the subtraction of t from t merely reflects the fact that t is the 

o % , % o 

effective ,f ?ero point" on the t scale, for no value of -t smaller than 
this can exist, by the definition of t^ given above. Thus, the expres- 
sion is rio more than a "plain 11 power function t m with a shift in origin 
and a rescaling factor. 1 

It is evident from ertpi^ssion (2.4) that h(t) is an. increasing 
function of t, a constant , 4 br -decreasing funj££j.pn of t, according as 
c > 1, c = 1, or c < 1, respectively, as ? illustrated in Figure 1. ' 
From the meaning of h(t), the intuitive *• (although somewhat loose) inter- 
pretations .of the three cases are as follows: 

* i 
> 1. When c > 1, the longer a person persists with the item 

without responding to it, the more likely* it becomes that 

'he/she will answer it "the next moment" (which is roughly 

what the interval [t,t-fdt] means); 

2. When c = 1, the chances that a person will respond the 

# 'next moment, when he/she Wasn't responded so far, neither 

* increase nor decrease with time; 

» 

3. When c < 1, the longer a person persists with 'the item 

* without responding to it, the less likely it becomes that 

he/she will answer it the next moment. 



0 



i 



c = 1 . 




Figure 1. The conditional response rate (CRR) h(t) for three 
'choices of parameter c # . , 



It is intuitively plausible that items of all three kinds may exist in 
practice, depending on the difficulty and other properties of the item. 
(In particular ^he second case may correspond to an item whose solution 
depends on a sudden insight, the occurrence of which is independent of 
how long the person has been at the item so far.) Thus, the distribu- 
tion which results from substituting experssion (2.4) in equations (2[l2) 
and (2.3) will be quite a flexible one which can model a wide variety 
of types of items depending on 'the value of the parameter c, .which is 
hence a^crucial one. 

Making* the stated* substitutions and carrying out the integra- 
tion called for, we obtain * 



t- - t 



1 - exp [-(- 



-) C ] 



for t > t 



(2.5) F(t) = 



for t < t* 



14 



7 



as the Weibull distribution function 2nd 



/ ' t - t . t - t 

' c f o;c-l r ( o N c c > 

— ( ) exp [-( — ) • for tit 

PP. V . o 



o o 
(2*6) f(t) = <. 

0 for t < t 

\ - . o 

as the Weibull density function. In the system-reliability theory lit- 
erature the three parameters t Q , P q , and c are referred to as the loca- 
tion, scale, and. shape parameters, respectively. Since we let t Q be the 
lower limit of the range* of t in the general derivation of F(t) from 
h(t) above, it is clear that this ^parameter is the theoretical value 

of t such that prob(t<t ) = 0. ,Thus it is natural to call this the 

a Y '-1 
location parameter. The scale parameter H q specifies the 100(l-e ) 

percent point of the distribution of t - t [i.e., prob (t<t +y ) = 
. # o oo 

1 - e~ - .632] as may readily be verified" by letting t = t + p in the 

, oo 

expression for the distribution function F(t) given in equation -(2.5). 
The shape parameter c is the most interesting* of the three, ^or it deter- 
mines the general shape -assumed by the density function. If c < 1, 
there is no mode and the density function decreases monotonically with 
t. If c > 1, the distribution is unimodal and. skewed, with mode at 

t + p (l-l/c)"^°. Interestingly, the skewness changes fr.om positive 
o o • ^ 

to negative at "approximately g = 3.60. Figure 2 shows the density func- 
tions of Weibull distributions with t = 2, p = 15, and four selected 

o o 

values of c. ^ 

The mean and variance of a random variably following the. 
Weibull distribution W(t ,P q c) are as follows: 

(2.7) E(t) = t + p r(i+i/c) 

o o 

and « • 

(2.8) Var(t) = p*[r(l+2/c) - T 2 (l+l/c)], 



where T( # ) is the gamma function, defined as 




CD 



\ 



°- 5l x j. .J 

0 




S3 



) 



0 
\ 
t 
5> 



0 



I 

0 
I 

ft- 



«/i ",n uitj 



I 

ft 

o 



ro in) ro rf 



ERIC 



16 



r(m) = 



j -u m-1 , * » , " 

e u du 



= (m-1)! \- when m is an integer. 

2 . 2 -Comparison of Several Related Distributions ■ 



As a matter of incidental interest as well as a possible aid 
in subsequent discussions relating the value of the shape parameter c 
to the nature of the item or other unit of a test, we display tbe dens- 
ity functions of several related (or in some sense.. similar) distribu- 
tions and also indicate what each of them reduces to when c = 1. 

The density function of the two-parameter gamma distribution 
used by Rasch (I960) as a model for the distribution of time taken to 
read a passage of N words is, in a notation consistent with what we 
are using for the Weibull distribution, 

(2.9) • f\ (t) = [^-(^-) C_1 exp (-~)]/r(c). 

8 , M o M o " M o 

The in Rasch ! s equation (6,6) corresponds to our c, and his X to our 

1/M . Equation (2.9) is equivalent also to that given by Restle and 
o 

Davis. (1962) in their k-stage model for problem solving, where k = c. 
To serve in either the Rasch or the Resale-Davis model, the c in equa- 
tion (2.9) must thus be art integer, but there is no such requirement 
in the density function (2.6) of the Weibull distribution. If we let 
c = 1 in equation (2.9), we get the density function for throne-para- 
meter negative exponential 'distribution: 

(2.10) f. (t) = — exp (-— ). 

le y * \x 

o o 

•> 

i 

Again, 1/m 0 is customarily written as X and' called the- intensity para- 
meter. < - 

On the/other hand, if we let c = 1 in equation (2.6), we get 
the density function of the two-parameter negative exponential distri- 
bution 



r 



17 



t - t 



(2.11) 

V 



f 2e (t > " t 



?xp ( — 



) > 



which is the model found by Bree (1975) to offer a better* fit to the 
distributions of solving^ times for Restle and Davis' th£ee ^problems' 
than did the two-parameter gamma distribution, (2.9). Bree -called 
•(2.11) the negative exponential distribution with shift in location . 
In other words, this density function starts at t. =i t 0 instead of t = 0 
as' does the one-parameter negative exponential distribution, (2.10). 

• Now it is well kiipwn that whin c is an integer greater than 4, 
the two-parameter gamma distribution (2. ,9) is a c-fold convolution of 
the one-parameter negative exponential distribution (2.10),. In other 
words, if there are c independent random variables t t~, . . ., t 

N 1 Z C 

each following the one-parameter negative exponential' distribution (2.10)^ 
then their sum t 

JL 

[Thus Rasch's model for the distribution of reading 
for an N-word passage amounts to saying that the reading times 
for each word fol-.ow a negative exponential distribution and that the \ 
N distributions .are statistically independent. Similar remarks hold 
for Restle and Da^is T k-stage problem-solving model.] 



+ follows the two-parameter gamma 
c 



distribution (2.9) 
/ 

tim 



In analojgy to the fact just stated, ( that the two^parameter 
gamma distribution] (for integer c) is a c-fold convolution of the* one- 
parameter negative^ exponential distribution, it might be tempting to 
jump to the conclusion that th'a Weibull distribution (2.6) with inte- 
ger c (>1) is a enfold convolution of the two-parameter negative expon- 
ential distribution (2.11) — in view of the fact; that (2.6) reduces-to 
(2.11) when c = l/. This is not the case, however. Rather, a c-fold 
4 convolution of the two-parameter negative exponential distribution gives 
rise to the thr</e-parameter gamma .distribution having the density func- 
tion 



(2.12) f 3g (t) = [±+ 



t " t 1 t 

OvC-1 ^ 

— ) exp (- — 



-)']/r(c) 



Note that letting c = 1 in this equation also leads to equation (2.11). 
Thus, the Weibull distribution and the three--parameter gamjn^ distribu- 
tion have in common the property that they both reduce to the two-para- 
'meter negative exponential distribution when c = 1. 

A comparison "of equations (2.6) and (2;12) shows that the 
density functions of the Weibull and the three-parameter gamma distri- 
butions are strikingly similar, Aside from the absence of the normali- 
zing constant l/r(c), (2.6X differs from (2.12) only in the presence of 



11 

13 



\ 



c at two places where it is lacking in (2.12),. Thus, in a sense, the 
WejLbull distribution may be regarded as a soraewhat generalized form of 
the three- parameter gamma distribution. 



4, 

Parameter estimation 

" I 

It must be conceded that* the jnetfiods we 'used to estimate the 
parameters of, the Wiebull distribution and those of the two-parameter 
gamma wer,e not the best possible. But, operating as we were tinder tight 
time constraints and since PLATO IV uses a time-shaifing mode with rather 
limited storage capacity allocated to any one, user, ^we had to make do 
with relatively^ simple methods with Reasonable accuracy* It might be 8 
mentioned in passing that PLATO V terminals, each equipped with a micro 
computer of its own, are becoming moxe and more widely available, and 
they would circumvent much of the limitations under which we operated. 

It was unfortunate also tlat we did not become cognizant of 
the three-parameter gamma distribution early enough to. include/it among 
th^ distributions to be fitted to our data. However, a comparison of 
equations (2.6) and (2.12) suggests ythat not much has' been lost, Since 
the ; two density functions are remarkably similar, as noted earlier, and 
if /anything the Weibull distribution appears to have a slight edge on 
the^gamma in flexibility. Whether this is indeed. so must await future 
research, however. 

3.1 The Weibull-Distribution Parameters 

*, * 

The problem of estimating the parameters of - a Weibull distri- 
bution has been the subject of a number of papers (e.g. ,V^Harter and 
Moore, 1965; Johns and Lieberman, 196fc; Mann, 1967, 1969* Lemon, 1974). 
Most of tfiese, however, ejLther deal with two-parameter versions of the 
Weibull distribution (i.e.*, when one Or another ot the three | parameters 
is assumed known) or present iterative methods whose programming ap- 
pears to be an enormous job. Believing that the maximum likelihood 
method would be the most accurate, and before becoming familiar the 
papers just cited (since Sato [1971] spoke only of a rudimentary method 
using ,a special Weibull probability paper) one of the present authors , 
derived the likelihood equations and struggled for som£ time"^ to solve 
them. He concluded — correctly as it turned out — that they were capable 
of solution only by tedious, iterative methods. Since *time was of the 
essence, he abandoned the maximum likelihood approach 2 ' and imptovized a 
rough-ancl-ready method based on linear regression, as follows. 



He subsequently 1 became^ aware that a computer pro-am lifting, for 
this method could be obtained from H. L. Hart'er, Aerospace Research Lab- 
oratories, Wright-Patftersqn AFB, JDhio. But even 'adopting and implement- 
ing this on the PLATO system would have been a lengthy task. 



13 12 



We first rewrite equation (2 . 5) for, the WeibulJ distribution- 
function as \ » . ■ 

X - t 

1 - P = exp [-(— — 2 ) C J, % • * \. 
M o 

i * • t * 

' * • ' ' - A 

where F(t) has been' denoted by P for- short, it being understood that it 
is a function of t and corresponds^ to the observed proportion of exam- 
inees who respond to ap item by a given time t. v Taking the natural 
logarithms of both sides of this equation gives 

t - 1 • , « 

An(l-P) =•-( °)°. 

y o . 

Changing the signs of both sides and taki'ng their natural logarithms 
again yields ♦ 

(3.1) AnAA(l-P)"" 1 = c £n(t-t ) + k(ji TC ). 

o o 

If we -now let ' • 

(3.2) £nS.n(l-P) _1 = Y , 



(3.3) £n(t-t ) = X 
o 



and 



(3.4) £n(y c ) = a, 
o 



equation (3.1) becomes- 



r 

(3/5) Y - cX + a, 



which looks just like an ordinary linear regression equation of Y on X, 
The only (but big) difference is that X itself is not completely ob- , 
servable, because it depends on one of the unknown parameters t 0> as 
equation (3.3) shows. [Note thaL if we were dealing with a two-para- 
meter Weibull distribution with t 0 known (usually, t Q = 0) then (3.5) 
would indeed be a regular linear regression equation, and the estima- 
tion of c and a wouJLd be a simple matter.] . 



ERIC 



;0 



Therefore, we had to resort to a trial-and-error method to 
estimate t first, and then apply the .standard methods of • linear re- 
gression to estimate c and a, from whicti in turn^ /is determined via" " 

* o 

equation (3.4). The principle adopted for guiding , the trial-and-error 

procedure was to maximize the correlation between Y = £n£n(l-P)~^" 

(which is observable) and X = £n(t-t Q ), which becomes an observable 

once^ Soma value is given to t Q . The' search started by dividing the- ' 

interval [0,t . -ft . /200] into 20 subintervals (where t . is the' 

min rain * - mm • L ' 

smallest observed response time) and calculating r with t ; given trial 
^ t 1 xy a & 

values equal to the endpoints of these subintervals. Next, the (closed) 

interval between the trial value pf t yielding the largest value for 

o ' * < 

r and the adjacent one giving the next largest value for^" was di- 
xy - ' xy« 

vided into ten equal subintervals and their endpoint^ were ta^en as the 

second set of trial values for t with which to calculate r . Finally, 

° . xy 
the interval between the trial values among this set that yielded the 

two largest values £or r was again divided into ten subintervals and 

xy 

their endpoints were taken as the third set of trial values t with ^. 

o 

which ta calculate r . . The optimal among these trial values was taken 

as our final estimate t of t . 

o o 



Once the estimate t is determined, X = £n(t-t ) is calcul- 

0 . A 0 

able for each observed ,value of t, and thence c is computed from - 



= FXY - (y^(TY)/n 
• Ix 2 -*(Ix) 2 /n 



1 

where" n is the number of observed response times. Then a is computed 

As /*■> 

as a = Y - cX, and ix is obtained by solving equation (3.4) for it; 



M o = (eJcp a) 



-1/c '( 



This completes our estimation of the three Weibull parameters, roughs 

and-ready .though it is. In the subsequent sections we omit the cir- 
$. \ 

cumf Jexes and write t , u and c for these estimates to simplify the 

o. .o * ff J 

notation, since we will not need to refer ^to the true -parameter values. 



3.2 The Gamma-Distributiott Parameters 



Here, too, the maximum likelihood estimates would -profably 
have been the moat desirable, but due to the limitations already?1ften- 
tioned we again adopted a simpler method, the method of momenta in this 
case. .Since only <t wo parameters have to b.e' estimated , it suffices- to 
express the theoretical mean and variance in terms of the parameters 
and equate thes'e expressions to the observed mean and variance, re- 
spectively, ' > ' X 



t 



are 



The required' expressions, computable from equation (2,9), 



E(t) = y c 
o 



and 

2 

Var(t)- = y c. 

o 

It readily follows that 



} and 



Hence, 

\ - 



and 



y* = Var(t)/E(t) 
o 



c = [E(t)] 2 /Var(t). 



2.- 

y o =.s t /t 



\ 



/"\2, 2 
' c = (t>) /s t 

' 2 

may be taken as estimates for y and c, where t and s are the sample 

o 1 t ^ 

mean and variance, respectively. In the sequel, - we write a for c and 
p for li Q to avoid confusion with the corresponding ^eibull parameter 
estimates* 

4. DESCRIPTION OF DATA 



As mentioned earlier many different data sets were used in 
this study. Some of them were from lessons (which include instructional 

- • \ 



segments, exercises, aQd quizes), tests (pre and post) on matrix alge- 
bra that were developed by one of the present authors for the dual pur- 
pose of serving as a self-study course to accompany several statistics 
courses taught by the other author and two of his colleagues in the 
Educational Psychol6gy and Psychology Departments -of the University of 
Illinois at Urhana-Champaign and for gathering data for this study. 
Others came from over 30 lessons, and their accompanying tests, on 
general and special vehicle maintenance ^training developed by the 
Chanute Air Fo'rce Base Computer-Based Education (CBE) Project Qroup 
under the sponsorship of the Advanced Research Projects Agency (ARPA) 
of the Department of Defense. 



4.1 Matrix Algebra Lessons and Tests - 

Th£ matrix algebra course, written on the PLATO system by one 
of the present authors with sqme assistance from one of her 'associates, 
is intended'for graduate students in educational and psychological 
statistics — particularly multivariate statistics — who do not have much 
mathematical background. Topics covered include the basic definitions 
and simj^Le operations of matrix algebra, matrix multiplication, matrix 
inversion (including the definition and calculation of determinants) , 
linear transformations and axis rotations, and eigenvalue problem^. The 
.cptirsd^is divided into five lessor/s corresponding to the above topics, 
and their average completion times range from 20 minutes to 2 hours per 
lesson, (See Appendix A for several sample pages of the course*) 

'/ ' 

^ \ The PLATO system permits a student to make any number Qf 
passes through any instructional unit, which may be the actual instruc- 
tional* Segment,., a set of exercises, or a quiz* and which is called an 
"area" in PLATO terminology. Ea.ch area is identified by the lesson <• 
number preceded by the letter i, e or q (for instruction, exercise or 
quiz),\and followed by the instructional segment number within that* les- 
son. Thus, for example, lf i036" refers to the sixth instructional seg- # 
ment in lesson 3, while ,! e036 M refers, to the exercise. set for the'sixth 
-instructional segment in lesson 3. An exception occurs in lesson 5, 
which contains but one instructional 'segment as such (i051) followed by 
% threa* t exercise sets (e051, e052 and e053) of the problem-solving type 
^to^augment the instruction. There are 36 aVeas in all, whose codes and 
content matter are listed in Table 1. 1 

The set of data for any area includes, among other things,* 
the name or ID number of each student who went through .that area-cotur> * \^ 
pletely at least once, the pass (or try) number, and the ttyne he/she ^ 
took^ori each pass* For the purposes of data analysis, only the time 
taken on the first pas« (if completed) through each area for each stu- 
dent was considered. ^ * 



16 



23 



Table 1 

List of Areas and their Content In Matrix Algebra Lessons 
Introduction to Matrices 
area content 

.101 1 Definitions and simple operations o£ a matrix 

i012 Use'PLATOas a'calculator 1 ' 

eOl 1 Eleven exercises 

Matrix Multiplications „ ' - 

1021 .Multiplication of two matrices A and £ 
e02X Foru exercises 

1022 ^ Multiplication is not commutable ,i.e.,'AB ^ BA 
*e022 Four exercises 

1023 - Scalar product * 

e023 Four exercises 4. 

1024 " Matrix product 
e024 .Four exercises 

1025 Quadratic product 
e025 Four qxercises 

* * i026 , The principles^oi matrix operations 

e026 Exercises 

i027 Diagonal matrices 
e026 /^^Tour exercises /K 
i02 J 8x^ Scalar matrices and identity mi^rix 

e^o * Four exercises 

v Determinant '"and Inversion of a Matrix 

< «t 

1031 Identity matrix 

q031 Five item quiz 

1032 * Definition of the determiant of a matrix 

1033 Evaluation of the determinant of" a inatrix 
q033 Five item quiz 

1034 Cofactors, expansion of a determinant ^ * 
e034 Exercises for cofactors, expansion of a determinant 

1035 Properties of determinants 

1036 Adjoint and inverse of a matrix A 

Matrix arid Linear Transformations 

1041 An example of linear transf ormation jaxis rotation 

1042 Properties of orthogonal transformations 

1043 SSCP matrix 

Eigenvalues and eigenvec to rs / s 

i051 Definition of eigenvalues and eigenvectors 

e051^ Calculate eigenvalues 

e052 Calculate eigenvectors „ ^ 

e053 Normalization of eigenvectors 

M 17 *m 

24 i 



;«'A 48-item pretest was given to all students before they stu- 
died the matrix algebra course: (See Appejadix *B for a list of the items 
plus a sample item as it appears on the PLATO screen.) The original 
version of 'this pretest was constructed a year and a half before FaJLl 
1976, and had been us^d in the multivariate statistics course. - It was' 
„ designed^ to minimize guessing by permitting students who 'did not know 
the subjejtt matter related to a given item to omit it and go on to the 
next by pressing the NEXT key without having- to' choose any of the 
multiple-choice options in the earlier item. There were 88 students 
who tried every 'item 'and were, thus' likely to have taken the test ser- 
iously in an earnest cfesire t6 find out their initial level of m knowledge 
The data for these 88 students $re referred to as the fl pre-revision 
data" in the sequel. Af^ter all 48 items have been answered, the pretest 
provides feedback by indicating which option the student chose for each 
item, the correct option for that item and, at *the very end, a recommen- 
dation as' to which lesson the student should start from. 

In Fall 1976, a revision of the pretest was undertaken: in >s 
light of information obl&ined from the original versio.n . Some displayg^ 
wordings and options were changed, bun the biggest change was that the* i 
NEXT key could no longer be used without choosing some option in each . 
titem, thus forcing students to respond to every item'. .The feedback 
system was retained in the revised version, however. Da£a from the new 
version of the pretest are referred to as th& "post-revision data" 
•below. 

«. • < 

At the same time, a p^osttest for the first tCo les*sons com- 
bined (simple operations and matrix multiplication) and one for £ach of 
the other lessons (lesson 3, matrix inversion; lesson 4, transforma- 
tions; and lesson 5, eigenvalues and eigenvectors) were implemented and 
the time and performance scores on' th&se posttests have been collected 
/ since then. Only those who completed each lesson could take the cor- 
responding posttest. 

Since most instructors of the relevant statistics courses 
did not forcibly require all Students in their classes v to study the 
matrix algebra lessons on PLATO, data for these lessons came mainly From 
volunteers who selected the topics according" to their own judgment. But 
taking the pretest was requested by most instructors. Thus, computer- 
ma^naged instruction (CMI) was not carried out, and instead^ of forcing 
the students to adopt a predetermined strategy, almost 'complete freedom 
of choice of learning strategy was allowed, the students. We therefore 
did not develop a computer-managed router of the mastery learning type. 
Instead, data collection routines were implemented within the lessons 
and tests -sd that^all the students' behavioral records were collected. 
That is, for each student and each area, the time spent in that area, 
the number of questions attempted (whether in an instructional segment, 
an exercise or a quiz) the number of questions ultimately answered cqr- 
r^ctly, the number of questions correctly answered on the first try, 
and the number of times the student ^requested and received on-line 



tyelp, were recorded. In addition, for the items in the quizzes, the 
pretest and the posttest, more detailed data were collected; the re- 
sponse time for each item, whether *the item was answered correctly or 
not, and the jnumber of times the item was attempted. ' 



4.2 Chanute AFB CBE Projefct Lessons and Tests 



These JLessons have been developed over a period of more than 
10. yjears, as a cooperative enterprise between the Chanute AFB CBE Pro- 
ject group $nd members of the Military Training Cefit'er (MTC) group* at 0 
the Computer-Based Education Research Laboratory (CERL) of the Univer- 
sity of Illinois at .UrbanarChampaign, for the -purpose of training 
Special and general purpose vehicle repairpersons (Dallman,. 1977).. 
There are*34 lessons, comprising about 30 hours of instruction, along 
with a^criterion-ref erenced test for each. v The lessons are homogeneous 
fn subject matter (in the sense that they do not natura^y form a 
hierarchically organized set) 'and tutorial in style £or the most part. 
Nevertheless^ they are alfranged in a specific order and students must , 
achieve mastery in one lesson as assessed by the end-of-lesson test be- 
fore they can proceed to the next. *If mastery is_ not achieved, they 
> must ^repeat the lesson. >V listing of the contents of the lessons is 
given in Appendix C. / 



Th^ 34 associated tests consist mostly of matching and mul^- 
tiple-choice items, and they vary from 5 to 20 items in length. Only , 
one pass is alloVed through. each test and nq feedback is' given. The 
tests are called MVE^(for Master Validation Exams) and are numbered to 
correspond to the lesrsons; e.g., the test given at the end of lesson 

"101 is denoted MVE 101. The mastery levels are set at 80 percent, but 
the .cutoffs actually used are somewhere between 75 and 90 percent cor- 
rect. , + . ^ r 

** ' ' • 

*' A lesson IsVsaid to be validated when 90 percent of the 1 sttf-. 

dents have achieved mastery hy getting 75 to 90 percent of its MVE test 
items % correct . The samples yielding the data for analysis in this study 
consisted of about 30 students per lesson, though not necessarily the 
same 30, each time. No modifications of lessons were made until all the 
students ^finished them, and all' lessons were validated (after which 
they migrit be ,mod-Lfied) between April and September, 1975, inclusive. 

The data collected included test scores on the MVE tests, 
completion time for each test, the Completion time for each lesson each 
time it was studied (which may be jus,t once or several times*, depend- 
ing on quickly mastery was achieved) , and the total time spent on 
each lesson until mastery. The last mentioned, time is called the "mas- 
tery time" for ,each lesson in the sequel.* Unlike the matrix algebra 
lessons, data are available only for entire lessons and not for their 
constituent; parts. 



19 



A flow chart of the lessons and tests in the Chanute AFB CBE' 
Project is shown in Figure 3. 

4.3 Teaching Strategies and Lesson Styles 

Since the matrix algebra course and the Chanute AFB CBE Pro- 
ject course in motor vehicle maintenance differ considerably in their 
teaching strategies and lesson styles, we compare them here although 
some of the descriptions were already given above. 

Virtually every lesson in the Chanute course /followed the 
si^le tutorial learning activity that can be characterized as a linear 
series of instructions and questions. Ev^y student is required to 
proceed through the same Material in each lesson regardless of prior 
knowledge or ability. Since these students Were first-year Air Force 
draftees with only a high-school education for the most part, this les- 
son style is probably well suited for them. % They probably could not 
be trusted with much^freedom of* choice. * • 

• v 

By contrast, each lesson in the matrix Mgebra course has an 
index page at the beginning, as illustrated in Figure 4. Each student 
can choose a particular, le§£on segment covering the topic of His/her 
choice." Sin£& the students taking the matrix algebra course were all 
graduate students in educational psychology, psychology or accountancy 
£with a fev? from other departments) they were bright and motivated t 
enougtu_to control their own learning activities, and hence- this lesson 
style was|>robably the best for them. 

It should be noted that, in both,' courses , the posttest scores 
were significantly higher on the average than were the pretest scores, 
thus permitting us to -infer that learning -jdid take place regardless of 
which lesson style was used. ' 

To make 'somewhat more detailed comparisons, in the matrix 
algebra course some topics are taught by drill and practice strategies ^ 
while others are taught by problem-solving strategies. The particular 
strategy chosen was adapted to the nature of the topic. For instance, 
simple .subject matter such as matrix addition and multiplication are 
taught with the aid of exercises, following the instructional segments, 
that are designed to give students practice in calculations, while more 
difficult' material such/as eigenvalues and eigenvectors are taught with 
the help of exercises of the problem-solving type. All but one lesson 
contained the provision of allowing the student to go back for review 
to the preceding frame within any area (instructional segment, exercisrg 
^or quiz), and also^ to go clear back to the , index page (see figure' 4) . 
~fh/the latter case the student could choosy to go to an area other th^n 
' that in which he/she was working before re-calling the index. This re- 
sul££d^in some messy da-ta which had to be discarded' in ^ur analysis, 



27 




PRETEST ! 50 ITEM N0RME0 REFERENCED TEST, COEF. Q = 0.40 j 



1 LESSON 103 



[MVE 103 



1 LESSON 1040 

. I 

| MVE 1040 



I WESSON 104^1 
| MVE 104b 1 



LESSONS IN BLOCK 1 
103, 1040, I04b/l05 



T 



I LESSON 105 | 
1 MVE K)5 h 



BLOCK TEST 1 \ 20 ITEM TEST K a 2 , = 0.56 



[ LESSON 20lo | 

| MVE 2010 h 
p 

i 



I LESSON 207 , 1 
{ MVE 207 H 



LESSONS IN BLOCK 2 

20lo , 201b, 202b, 204, 205o, 
205b , 206o , 206b , 206c,207 ^ 



BLOCK TEST Z\ 20 ITEM I TEST, a 2 , = 0.33 



X 



LESSON 301 1 

T 

I MVE 301 | 1 

I ^ 



I LESSON 303 I 
| MVE 303 |- 



r • 

LESSONS IN BLOCK 3 

301 , 303 , 304 , 305, 307, 308 



BLOCK TEST 3*. 20- ITEM TEST, a 2 i = 0.47 



1 LESSPN 401 | 



MVE 401 



I LESSON 405c f 



I MVE *OSc \ — » 



LESSONS IN BLOCK 4 

401,402,403, 404, 
,405o, 405b , 405c 



BLOCK TEST 4". 20 ITEM TEST, a 2J - 0.42 



T 



POST TEST ! THE' SAME TEST AS PRETEST, COEF. a = 0.63 



Figure 3 

Block diagram of student flow in Chanute APR CBE Project 



21 



28 



4 



MJLTIPLICATION OF MATRICES 



2.1 Multiplication of A and B 

2.2 AB * BA 

\ 

2.3 Scalar Product 

2.4 Matrix Product 

2.5 Quadratic Form 

-2.6 The Principles bf Matrix Operation 

2.7 Diagonal Matrices 

2.8 Scalar Matrix and Identity Matrix 

2.9 Attitude Questionnaire and Posttest 

You can' enter the section you worked last by 
typing the section number. If this is your first 
time in this lesson you should begin from 2.1 

4 Index page of the lesson tinning multiplication of matrices 



V 



for* tbd times spent in the two areas became fuzzy, (In fact, it led 
to aMreduction from about n ■ 300 to about n = 100 in some analyses v 
Howevefc, since many students requested *this option, it was implemented 
after ,the Fall 1976 semester. Such are the disadvantages of collect- 
ing data in conjunction with learning activities in which the freedom 
to which graduate students are accustomed is permitted! 

By contrast, if a student did not achieve mastery at the end 
of a Chanute lesson, he was required to repeat the entire lesson. The 
total time taken by each studertt to master each lesson was recorded for 
the purpose of lesson validation. It was this mastery time that was 
used for our 4 nal y ses of the Chanute data. The situation here was much 
"cleaner 11 and under strict control by the i$tructor in typical mili- s, 
tary style. , * 

Finally, it should be mentioned that some students in both ' 
courses took£i>otes ^during their studying, which of course lengthened 
their study times.. §ince the percentage of such students amounted to 
only about 1 or 2 percent of the total sample, we did not discard the 
data from these students — which would have been difficult to do without 
closer monitoring and log keeping than we were able to effect. We ra- 
tionalized this state of affairs by regarding note taking as part of the 
normal learning strategy for some people, and hence the time for this ^ 
activity should be included in their study time. . * 



ANALYSIS OF PREREVISION DATA 



Before presenting the results of analyses based on the prere- 
vision data (which, it will be recalled, are the data from the matrix 
algebra pretest prior to its revision in Fall 1976) we give a brief 
description of the PLATO IV system and -tlie programs . that were imple- 
mented on it for this study. Virtually all of the programs were writ- 
ten by Robert Badllie aside from some contributions made by Tamar ° ' 
Weaver, Kay Tatsuoka, and Jerry Dyer in this order of involvement. -* 

5.1 »» The PLATO IV System and the Programs 



PLATO IV (Programmed Logic for Automated' Teaching Operations) 
is a computer-based education system developed at the University of 
Illinois at Urbana-Champaign having a large-scale central computer (the 
Control Data Corporation Cyber 73-74) with about 1,000 terminals con- 
nected by telephone lines throughout the United States. Approximately 
5,000 hours of instructional material have been used in several hundred 
subject-matter areas, andfadditional lessons are constantly being de- * 
veloped. The target populations range from preschool children to 



9 

ERLC 



23 

30 



graduate students, including such diverse groups as prison inmates and 
special" adult-education recipients like industrial workers, Armed 
Forces personnel, and the physically handicapped. 

r 

The PLATO system itself was used as the primary analytic tool 
for analyzing the student data collected automatically on the system, 
besides serving as the deliverer *of instructional material. Data 
processing can be done directly without having first to punch the data 
onto cards, and the results can ~6e utilized for such diverse purposes 
as adaptivj^testing, computer * managed instruction and item analyses 
leading to modification of weak instructional units. 

The computer language used is called the TUTOR, which is some- 
where between FORTRAN and assembler language in its capability and pre- 
cision for numerical work. Each word of the computer is 60 bits in 
lingth, Xfhich provides for greater accuracy than most existing computers. 
This feature is especially at a premium when iterative calculations are 
required as in the computing of gamma or beta integrals which abound 
in statistical work. . Approximation routines for various theoretical 
distributions were written, along with that for the Kolmogorov-Smirnov 
test of goddne$s-of-f it of observed with theoretical distributions. 
This involved a great deal of adaptations and modifications of existing 
statistical programs, mainly from the IBM Scientific Subroutine Package 
(IBM, 1972). 

Since PLATO operates on a large-scale time-sharing mode, spec- 
ial problems exist in programming for it that are similar to using a 
minicomputer in terms of storage size'. The core size per user is 

.limited to 1650 words at a time for data processing. If the computa- 
tional requirement exceeds this limit, transfer routines mUst be de- 

* veloped for moving the data and intermediate results back and forth 
between the disk storage and the core, where data processing is done > 
successively within the limit. A list of the computer programs written 
on PLATO expressly for this study is given in Table 2.* 

1 ,■ " 
5.2 Weibull Fitting of Item "Response-Tim^ Data , « 

c 

• The 48 items of the matrix Algebra pretest shown in Appendix 
B were implemented in the test frame written by James Kraatz of CERL 
and modified by K. T&tsuoka into two parts — one allows us to edit data 
arid the other stores and transforms the data -format so as to be accept- 
able by the programs, for estimating the Weibull and gamma parameters. 
Data editing was necessary for several reasons. One was o that we were 
interested only in the first pass data, as mentioned earlier, even 
though second-pass and third-pass data were also on record. Sometimes 
the system would f, cfash ,f while the student was taking the test and he/ 
she had completed, say, the tenth item. Then the response-time record 
for the remaining 38 items would consist of blanks. The TUTOR. would v 



24 



31 



List 

Program Name 
matx4 



.edittest | 
s toretest 



datam 



t * Table 2 ' • 

of PLATO Programs Developed under the Project 

Brief Description * 

The 48 item matrix algebra test. It collects 
performance and response time data* 

Shows and allows us to edi't the datafr Do the 
simple item analysis. 

Transforms and stores the data collected from 
matx4 inu§? a permanent storage (dataset)* 

Calculates the item characteristics of 48 items 
and estimates the individual student's performance 
level. 



gram 

Kappa 

sub* 

matsubr 

cutoff 

llab 

wb2 



Estimates the individual gain scores by regressing 
the time score difference onto the pre-tes t, post- 
test, and other variables. 

Calculates Kappa inc(ex from a test. ^ 

Calculates various probability functions. 

Calculates the determinant of a matrix, inverse, 
eigen values, and eigen vectors. 

^Evaluates the optimum cutoffs scores of a 1 « 
criterion-referenced te6t, estimates the 
probabilities of false positive' and negative • * 

Plots various relationships between- the test 
information such as 21 vg.^frrt^ability of false 
positive , etc . 

Estimates Weibull parameters of die <jlata from 
mat24. 



s tatedit 



wb2area 



kolmo 



gamma 



Input output routine with a data format that was 
adapted as the standard format for all prdgran>s 
developed by the NIE project. 

Estimates Weibull parameters from the data stored' 
via statedit format. 

*" * » • 

Kolmogorov-Smirnov testing routine for matx4-data 

format. 

*. 

Kolmogorov-Srnirnov testing for statedit format 



25 

32 



• 


• ♦ 

* 9 

V 






(Table 2 % cont.) 

• * * 


> 

* 




wgraf ; 1 Comparison of Weibull distributions associated 

with the items* Density functions of various 

Weib'ull parameters. Plotting, of conditional 

response rates. • <> ' 
* %. 

kgraf * Drays graphs of Weibull distribution and density j, 

function based on typed- in parameters. 

J Note. Various univariate and multivariate statistics routines were 
developed to analyze our on-line data stored on the PLATO system. Also 
several transformation programs „were developed. Their descriptions and 
main programmers are listed in Appendix C. 
















X 

✓ 

1 ' 


i , • * 


f 








4 






* 

* 

* 0 

♦ 

N 


J 

•* 


( 


\ • - * • ■ 

» 


♦ 


\ 

9 


' J- • • ■ 




ERIC 


' • / . .' 25a 

33' N 


♦ 



memorize the item number at which the crash occurred, and would auto- 
matically send the student- to the eleventh item on his/her second 
entry. Xt that time the respohse-time dat^ for the first 10 items 
would be blanks and actual times would be recorded from the eleventh 
'item on* We then had to combine the two sets of data to get the score 
* and response-time data for the first try for that student* Sometimes 
we would encounter data records in which the same response option was 
chosen for all items, thus indicating that the student (or instructor) 
was merely examining the items and not taking the test* Such data 
Would, of course, have to be deleted* All told, there was about a 20 
percent attrition due to editing to clean up the data* 

Using these cleaned-up data, a Weibull distribution was fit- 
ted to the observed time-score distribution of each item in three ways: 
once for the entire sample, secondly for the subgroup of students who 
answered the item correctly (called the "OK subgroup 11 ), and finally for 
the students who got the item wrong (called the U N0 subgroup 11 )* The 
fit of jfche observed to the theoretical distribution was tested each' time 
by the Kolmogorpv-Smirnov test of goodness of fit* The OK subgroup " 
4 and NO subgroup had considerably different estimated Weibull parameters, 
but both showed very good fits for most items Ninety-three and 92 per- 
cent of the 48 items had p-values* for the Kolmogorov-Smirnov test of 
goodness of fit with Weiiiull distributions that exceeded ^20 in the OK 
and NO subgroups, respectively, and 65 and 83 percent exceeded *50 re- 
spectively* Considering the fact that two items which needed correc- 
tions during the fall semester of 1976 due to unclear display on the 
sfcreen or ambiguous wording showed very poor fit,* with p-values of 
0.0053 and 0*0550, the fit of nearly all of the other items is seen to 
be satisfactory to excellent* Weibull \iistributions did not fit the 
time-score data of the total sample as wfctll as they did those o'f the . s 
two 'subgroups* Only 69 percent of the itans had p-values larger than 
.20, and 56 percent had values larger than 3&CU^ This fact suggests that 
students in the two subgroups are going through different processes to 
complete each item; thus the nature of the time-score clata in the two 
groups might be entirely different* 

Tables 3 and 4 show the p-values and the maximum discrepan- 
cies' (z) for the OK and NO subgroups, respectively* Tables 5 and 6 
show the estimated* Weibull parameters for the 48 items iii the two sub- 
groups* t - .* 

Next ard shown figures illustrating the degree of observed to 
theoretical distribution .fit& for two typical items in each of the two. 
subgroups (Figures 5 through 8)* The fits (or lack thereof) are shown 
in two ways: , first by superimposing the observed cumulative distribu- 
tion graph onto the theoretical curve with the estimated' parameters; 
. and seccmd by fitting the regressions lines of &nltn(l-P)-l on £n(t-t Q ) 
to ^the observed scatterplot after determining the value of t Q yielding 
the maximum correlation between these two quantities (see Section 3*1)* 



^6 

34 



. , Table 3 

Ko'lmogorov-S mjjnroY Tests for Matrix ; fllgebr a fVetest;OK Group 



item K 2 N & item p z- " N 



1) ^B. 209.7 1.0617 77 


' 25) 0.3489 0.9329 .62 


2) ' B. 1571 1 . 127.7 82 * 


26) 0.3956 0.8979 59 


3) 4 0.8832 0.5853 68 


' 27) * 0.6966 0.7088 48 


4) , B. 9716 B. 4871 69 


. 28) 0.5-820 0.7770 44 


5) 0.2504 1.0188 .79 


29) 0.4534 0.8579 33 


6) 0.4419 0.8656 81 


30) 0.6424 0.7410 34 


7) 0.4675 0.8486. 67., 


31) 0.8145 0.63^/ 25 


8) 0. 1444 1 . 1463 61 . 


32) 0.9983 0.3873 18 


9) 0.416j0 0.8834 67 


33) 0.5176 0.8164 28 


10) 0.6237, 0.75^20 69 


34> , 0.8719 0.5942 18 


11) 0.3829^,0.9072 22 


35) 0.5414 0.8017 30 


12) £.9621 0.5028 27 


36) • 0.5891 0.7727 38 


13) 0.6196 0.7545 42 


• 37) 0.3821 0.9078" 27 


14) 0.9918 0.4335 46 


. 38) 0.8945 0.5759 24 


,15) ' 0.4205 0.8803 54 


39) 8 0.7887 0.6522 30 


16) 0. 6378 0. 7437 63 ." ' 


40) 0.9963 0.4081 18 


17) 0.8898 0.5798 28 


41) 0.8640 0.6002 17 


18) 0.2979 0.9749 59 


42) 0.9714 0.4875 31 


19) 0.8424 0.6160 29 


43) 0.9726 0.4852 24 


20) 0.3747 0.9133 40 


44) 0.8142 0.6355 • 21 


-21) 0. 5.740 0. 7818 37 


45) 0.6191 0.7548 22 


22) • 0.0184 1 . 5314 59 


46) 0.9776 0.4754 12 


23) 0. 7264 0. 6908 28 


47) 0.9954 0.4143 15 


24) 0.2101 1.0611 47 


o^48) , 0.9884 0.4467 7 



•"Vtest for al l subjects before 1976 Fal 1 semester ;' goodnes 
ERIC ; it testing for Weibull distribut ions 



Table A 

Kol mogorov-S mimoY Tests for Matrix Algebra Pretest; NO Group 



item p . 2 N 
1) 0.9944 0.4208 9 > 


item p 2 N 

, V 

25) 0)2910 0.9810 15 


2) 0.9996 0.3^36 2 


26) 0.7675 0.6656 19 


3) 0.7170 0.69*5 17 


27) 0.8400 0.6177 27 


4) 0.<9941 0.4222 16 


28) ' 0.2901 0. 9817 33 


5> 0.8236 0.6291 6 


' 29) 0.5341 0.8862 44 


6) 1 .0000 0. 2887 3 


30) 0.9017 0.5697 43 


7) 0.4598 0.8537 16 


31) 0.6827 0.7170 52 


8) 0.9673 0.4946 22 


32) 0.9174 0.5553. 54 


9) 0.7671 0.6659 /f5 


33) 0.9804 0.4689 ,46 


10) 0.8990 0. 5720 14 


jc — 
34) 0.775.2 0.6607 57 


11) 0.0594 1.3260 61 


35) 0.|6533 0.7345 39 


12) 0.8380 0.6191 56 


* 36) 0.7309 0.6881 36 


13) 0_. 9958 0.4ri6 41 


37) 0.5578 0.7916 47 


14) 0.9998 0.3430 ,37 


38) JJ/5712 0.7835 45 


15) 0.7103 0. 7006 27 


39) 0.6857 0. 7153 43 


16) 0.9525 0.5164 17 


40) 0.8435 0.6153 54 


17) 0.5022 0.8262 54 


41) 0.1067 1.2105 53 


18) 0.7430 0.6807 21 


42) ., 0.5508 0.7959 40 


19) 0.5990 0.7668 52 


43) 0.9164 0.5562 42 


20) 0.9359 0.5363 34 


44) 0.3085 0.9658 47 


21) 0.9493 0.5205 35 


45) 0.1106 1.2030 45 


22) 0.8630 0.6010 20 


46) 0.9442 0.5268 50 


23) 0.0827 1.2620 50 


47) 0.8813 0.5868 39 


24) 0^.6237 0.7521 31 


48) 0.7991 0.6454 22 



Pretest for all subj ects. befo 
rn?r>it' testing for Weibull d 

ML 



ve?1976 Fall semester ;' goodness 
lisxribut ions 

r 



.7 



Tkble <5 



f 



o T*he Three Wed bull Parameters for 

■ * 


Matrix 


Algebra Test 


i *t~ j^t rri ■ — ~ 




IT1 t C • 


c 




. 4 •** 

1 


* 9. 7 1 


AT Q ft 


1 . 05 


39. 52 




1 Q 


Jo . y 0 


1 . 32 


2 1 . 9.8 


^ -* \ 


1 » 7ft 


A3 • : ! y 


; t. 15 


16.86 


tr • 


> 

* 1 • to ^ 


1T1 Q CI 


' 1 . 34 


21.91 




._ . :? 1 




1 . 26 


' 12.34 


D • 


■H 7-7 


A3 :' 


1.15 


1 1 : 00 


' .— » 

- 4 »*' • 


. D J 


GV Q 7 


1.14 


31.46 


' • tf* 


0 . Jtf *t 


Of QQ 

w . y y 


1 . 27 


36. 62 


Q 

y • 


1 t- ^ 
I . D J 


Gf on 


1 . 38 


28. 37 




•"5 'c; 0 




1 . 38 


18. 28 


1 .1* 
1 '1' . 


1 nv ^ ^ 

1 yj m -? 0 


at o 7 


0.91 


* . 29.82 


1 o 


y JU . JLUJ 


tj • y 


1*. 34 


109. 63 


13. 


8.18 


' 0. 99 


1 . 25 


£4.46 


- ' 14. .' 


f ^ 14 ' 


0.99 


1 . 44 


"38.17 


15.» 


• 1-5.52 


0 . 9 9 


FT Q ft 


1 5 7 


16 . 


5 . 47 \ 


0.99 


1.16 


38.47 


17. 


0 . 00 ) 


0. 98 


1.17 


1'39". 65 




2.33 ' 


0.99 


1 . 52 


16. 89' 


• v. 


1 . 67 


0. 98 


1 1 ? 
1.1/ 






yj m XJtD 


m . y-o 


1 . 45 


66. 83 


21 . 


0.00 


0. 9 8 


1.13 

1 


68.66, 


22. 


3.19 


. 0.98 


1 . 59 


19. 82 • 


.23. 




0. 97 


1 .01 


16..86 


,24.' 


1 . 74 * 


0 . 9 ar 


1.19 


12.93 




2\ 75 


0. 9'S 


1 .09 


8 . m 


26. 


0 1 . 65 ' 


0. 99 


v 1 .21 


7 . n 9 




1 

9 




0 




♦ 


t 




l 


: < 


v.* 




29 




/ 










/ 




l 


37 







ERIC 



Table 5 (con t) 



The Three Weibul 1 Parameters for Matrix filgebra Test Items 



i 4" pmc, 


JL 






/ 

♦ JJ.O 


27 

i- i ■ 


1 Ofi 






1 Q WO 


*" — 1 <-« 


j- ■ . 


/-l a. 1 f 


IT 7 7 


1 £ on 


* 

<1 7 i 


•-• ^ n 
J ■ J i_ 


iTl" 7! >'| 


JJ ■ u -j 


1 ^ 


^ ITi 




nr g ? x. 

w ■ « » 


a D 


7 4 

£ T ■ JO 


~: 1 

J 1 1 


U • D.'J 


Ij 1 , „ 


uf Q K 


9 *1 '3 *1 

J. *T ■ 0 *T 


O ■ 






iTf 7 17*" 
..J • / Xj 


O _? - ^ D 




l"f L>t M 

U » ,VJ 


[TV f ! L 


1 9 *1 


1 7 £. C" m 




/ a U O 


i ]( Li 


|Tt' „ 7 7 


■Js.'. jf J 


s. 1 _ • 


A. ■ r 


W* C J '3 






36. 


.:. 9? 


0. 90 


0,73 


12. 49 


37. 


4. 73 


0 . y 9* 


1 . W2 


14. 34 


33. 


2.66 


1 . 00 


1 . 06 


13. 32 


39. 


2 . 66 


0. 9 9 


1 ! 28 


8.-12 


40. 


3. 75 


0. 9 9 


0. 70 


8. 09 


'41 . 


15. 35 


Of. 9*3 


1.14 


25. 48 


'42. 


8 . 46 


1 . 0.0 


1 . 47 


25.. 51 


43 . 


1.29, 


1 Attn 


0. 98* 


1 7 . 20 


4-f . 


2. 80 


i . a © 


0 . 3 9 


12. 86 


45. . ■ 


. 2. 65 


0. 99 


1 . 08 


10. $4 


46. 


2 . 08 


.9 . 9 9 


0.94 


22, 52 


47. 


:;. 68 


' 1 . £0 


0.88 




48. 


6. 56 


0.99 . 


0.81 


10. 72 


Pretest given before 76 Pall semester* 


Ok subgroup 


t 



29a 



\ 



9 

ERIC 



38, 



V 

4 ' 



Table 6 



The Three Weibull Parameters for Matrix Algebra Test Items 
items tg 



1 . >. 


10. 66 


£. • 


0 . 00 


3. 


0. 00 


4. 


1 . 73 


rr 


0.0J0 


6. i 


S . 90 


7. 


1 .52 


8. 


0.00 


Q 


8. 73 


10. 


\0 . 00 


.11. 


1 . 35 


l>r — ' 


0. 59 


13. 


4. 49 


14. 


0.J0f0 


15. 


..1.52 


16. 


0. 92 


17. 


.. 0. 72 


18. 


- 1 . 86 


19. ' 


0. 26 


20. 


0. 92 


21 . 


0. 9? 


22. 


0. 18 


23. 


2. 69 


24. 


0.00 


25. 


0. 56 


26. 


•0. 64 



m . c . . 


c 




0. 99 


0. 88 


33^ 9 6 ' 


1 .00 


0.98 


3.53 


0. 0 7 


1 . 06 . 


1 3 . 05 


0 .99 ' 


0.69 


• 14. 54 


0 . 


1 .07 


6 . 02 


1 . 00' 


0. 55 


2.13 


0. 97 


. 0. 78 


33. 31 


-0. 9 9 


1.05 


37.07 


0.9 3 


0. 35 


15.16 


0.98 / 


0. 92. 


20. 6 9 

4 


0 .96 / 


1.11 


31 . 00 


1.0ff •/ 


0. 91 


69 . 67 


1 . 00 / 


1 ,01 


'33. 03 


1.00 


1.13 


46.24 


0. 98 


0. 6 6 


60.92 


0.9 9 


0. 38 


19. 33 


0. 99 


0. 83 


51.96 


8.99 


■ 0.80 


15. 30 


0. 93 ', 


0.92 


25. 56 


0. 99 


0. 62 


22. 99 


0. 99 


0. 69 


34.9^3 ; 


0.99 


1 .04 


. 13". 43* 


0. 99 


1 . 23 


14. 41 


0. 97 


1 . 56 


12.48 


0. 9 9 


1 . 0 1 


13.'3 


0. 9 9 


0. 76 


17.01. 



o 

ERIC 



30 

3$ 



V 



Table 6 (con*t) 



9 

ERIC 



>■ 1 f 

. * The Three Wei bull Parameters for Matrix filgebr a' Test \ I "terns 

items' tjj m.c. „c jx© 

27. 0.73 0.99 0.95 9.32 

23. * J0.94 0.99 • If. 68 6.21 

29. 0.63 0.99 .9.87 13.67 

3W. 0.85 tf.99 -0.93 ' 13.07 - * 

^31. - B.66 1.00 • 5.95 ' . ^2. 63 

32. 1 .03 • -0. 99 0.94 . 51.93 

' 33. 1 .77 0. 99 ' 0. 80 / 53. 73 

34. /0.55 0.99 1.15; 31.43 

35. 0.94 0.99 . 0.64' 18.60 

36. 0.71 0.99 1.00 7.22 

37. 0.52 0.99 . 0.98 16.30 

38. 0.95 0.99 0.35 10.57 
-39. ■ • 0.3 7 1 .00 .0.9 2 7.7 7 

40. 0.7 8 ' 1.00 1.10 3.3 4 

41. 0.32 0.99 5 1.16 39.03 

42. 1.97 ■ 0.99 0.75 11.66 

43. » 0.44- 0.99 -0.97 9 . 

44. 0.88 ' '0.99 0.75 5.99 
.45. 0.96 0.99 0.74 6.34 

46. 0.70 .1.00 jD.Qtf - 11.70 ' 

47. 0.00 0.99 f 0.82 17.10 

48. .0.00 0.99 "'0.96 24.23 

Pre tea 1 : si ver\ before* Fal 1 7fi semester, NO subgroup 



30<\. 



40 +s 



0K : Group 



quest i on number 1 



to = 2.71, max. corr. = 0;-98,/c.= 1.053, jjlo = 39.52 

1.0 T 



' 0.9 - 



.9 .8 - 1 - 



0, 



0.0 




0. 6 



0.5 - 



0. 4 



0.3 - 



0.21--' 



0. 1 - 



We i bu 3. 1 D i str i but i on : n - 11. 
The hypothesis is that the deta,. 
& re f rom a We i bu. 11 d i st p i but i on . 
C ■d n re j ect /hypot 1 le 5 j s w i b h prob - 
ability 0.2097 of being wror>s» 
for this saropl-* is 1.862. 



I | I i I I ! i I J. , t !_ I \ J_ 

I I T ■ i t i 1 I 



150 ,2m 2?)7 V'S 333 4£Ifi 



• x axis: response (3 to 

LfiQ for graph, .,NEXT for noxt question- 

Figure 5' ^Goodness of fit test for the tirae-seore data and Weibull 
- distribution function 



ERIC- 



31 

41 



\ 




n=77, 



x(l)=3, .xCnJ«416, max. eor rel . *0. 93, p*fl.2j09? 



t©« 2--71 - c=l .JB! 



MUX I : next quest i on , LftD : r 



Figure 6 Plot of lnln(i-P)" 1 as yaxis and ln(t-t 0 ) as x axis. The line 
is the regression equation of yon x. 



arid i rr'-'ei'-doo !; 



9 

ERIC 



42 

32 



V 



OK Group 



question number 16 



5.47, max. corr. = #.'99, c = 1.161, jji*. = 33.4.7 




leTi bu 11 D i st r i b ut i on : n = 67;. 
kfhe hyp:.- thesis is that the data 
are i Y orn a l»J e-i bu 11 J i st r i bu t i on . 
Can reject brother* is with prob- 
ab i ^ i ty jff , 6 ? 7 8 o f ;ve i ng wrorre . 
"s" for tlii =3 sarnpl? is '3.7437. 



150 175 ?6jJ 



>.'. axis: re.:[:on32 t ir^^s (6 to ?.2 r 0 

LAB for craph, NEXT for ncs<t question. 

Figure 7 Goodness of fit test for the time-score data and Welbull J 
distribution function 



£3 



\ 



quest i on 1 6 



jst 



l- 



A* 



4 

:/ 



+/ 



/ . 



/ 



/ + 



/ 



^ — — , — , — j — : — i — s — i — ; — i — j — i — i — i — -i — : — i — s — j — !-r- !- 

\<5' ' 1 i 3 . *S 5 

n - 6 3 , x ( 1 j =6, y. (n) - ?. 2 df , m ■; : . coi -r e 1 . =0 . 9 9 , p « 0 . 6 3 7 8 
t • ■ 5 . 47 . s- 1 fJ C J - n. « 3 8 , <i7 , 7. Cdav -J / (n- I ) =0 . 030 1 5 



NEXT : next question, LfS: r*G*rfc Slap*: end int'f.rc:-pt ■ 

Figure ,8 Plot of in ln(l-P)~ 1- as y axis and ln(t-t 0 )-as x axis. The 
line* is the regression equation of y on x. 



44 

3^ 



Since the Kolmogorov-Smirnov testing procedure looks only at the maxi- 
mum discrepancy (z) between the observed and theoretical cumulative 
distributions, ^he resultant M goodness-of-f it" depends partly on the * 
si2e of the intervals or units of measure used. In the matrix alge- 
bra pretest,' the item response times were recorded to the nearest 
secondhand hence this undesirable feature of the Kolmogorov-Smirnov 
test seldom manifests itself there. Even so, when the time range is 
small (as in true-false itenls) and the sample size is relatively large 
there are occasions when the p-value is rkther Small despite the fact 
that the fit looks very good to the eye. This trouble (if indeed it 
be a trouble) increases when we come to analyze lessons, where the times 
taken are recorded only to the nearest 10 seconds and is. further ag- 
gravated tthen we get to the Chsnute data, where the time unit is minutes 
and some lessons take only 25 minutes at a maximum. ' 



5.3 Characteristics of the Pretest Items 

The performance-score "data from the pretest items were pro- 
cessed by the computer program developed for computing various item 
parameters. (See description in Appendix D.) Some of the results* are 
displayed irtTable*7. The first column in this table, labeled "Dif- 
ficulty 1," shows the values of the traditional difficulty index — i.e., 
proportion of subjects getting the items right. The second column 
("Difficulty 2") , *on the other hand, give# values of a modified diffi- 
culty indes due to Loeschner (personal communication) . It is defined * 
as the estimated average probability thlat the particular item (i) is 
answered correctly but another item (j) is answered incorrectly when a 
randomly drawh subject is giveiOboth items'* i and j. The formula is 



(5.1) Difficulty 2 = J. , 

3-1 . 

> jfi. 

where n is the number of items in thd test, 

N 4 is the number of subjects taking the test, and 
n. . is the number of subjects who got item i right but item j 
1 ^ wrong. 

The reason we regard this alternative "difficulty" (actually 
"facility 11 ) index worth considering along with the traditional diffi r 
culty index is as follows. The topics covered in a matrix test are, by 
their nature, hierarchically ordered (or, more strictly speaking, lin- 
early related). For instance, in order to be able to compute a matrix 
inverse* one must know what an identity matrix i§S must know how to 
multiply matrices, must know what cofactors and adjoints are, and how to 
calcuAate the determinant of a (square) matrix. These prerequisite ~ 
knowledges must have been mastered earlier and the required calculations 



45 



35 



Table '7 



Difficulty indecis, discriminating power, and 
the Utrrelation of the total score and iteia-time score 



item 


Dif ficultyl 


Difficulty2 

■ 


. r s,i 


r s ,ti 


1 


.90 " 


.44 


.43 


-.03 


2 


• 95 


.48 


-.48 • 


.03 


3 


• 79 


.39 


.32 


.08 


.4 


.80 


.39 


.40 


.07 


5 * 


.92 


.46 


.39 


-.07 


6 


."94 


/ ' 47 


.45 


.10 


7 


, ,--78 


/ .36 


'.50 , 


.02 


•8- 


.71 


.32 


.55' 


.16 


9 


.78 


.36 


.53 


-.08 


10 


.80 


.37 • . • 


-.59 


.05 


11 


.26 


.'13 


.07 


.06 


12 


.31 


.13 " 


.35 


.39 


13 


.49 


.20 


.49- 


,.37 


14 


.54 


• 23 


.50 J 


.28 


15 


"i .63 


.27 


.62 ^ 


.14 


.16 


.73 


■ .33 


.58 


.13 


17 


.33 


.11 


.58 


.39* 


18 


.69 


.30 


.64 


.19 


19 


.38 


• 12 


.53 


' .27 


20 


.48' • 


.18 


.60 


.24 


21 


.44- 


- .17 


.56 


.41 


22 


.69 


.31 


.50 


.30 


23 


.33 


.13 


.43 


.35 


24 


.55 


.23 


.56 ' 


.33 


25 . 


.72 


• .32 


.61 


.20 


26 . 


'•69 


.32 / 


• 4 i 


.01 


27 


-.56 


y 23 


• 53f 


.38 


28 


-.51 


.22 


.48 


.21 


29 . 


.3-8 


.Ik 


. .59 • 


.38 


30 


.40 


.16 


.50 


.38 


31 


.29 • 


* .11 


.44 


.39 


32 


.22 


** .08 


.36 


.38 


33 


.33 


.11 


.60 


if* 7 


34 


.21 


.08 


• 39 


.34 


35 


.35 


.14 


.41 


.46 


36 


.44 


.18 


.51 ' 


.32 


37 


."31 


" ' .12 


.43 


1 .42 


38 


.28 


.11 


.41 


• 35 


39 


.34 


.12 


.57 


.42 


40 .. 


.21 


.07 


.41 


.41 


■41 


.20 


.07 


v .40 


.42 


42 


^6 


.13 


.62 


.46 


43 


.28 


.10 


.52 


.43 


•44 


.24* 


.09 j ' 


.41 


.38' 


45 


.26- ' 


.07 ' x 


•65 „ 


• .32 


46 


.14 


.03 ' 


^.55 


.34 


47 • 


.17 


.08 


• 19 


.42 


48 


.22 


.07 


• 54 . 


.07 



4 



. 36 4 6 



..carried out without error in order to achieve the goal of getting a * 
matrix inverse. These prerequisite's were taught in the first three 
lessons of m the matrix algebra course. The various test items in the 
pretest were roughly ordered by the difficulty of the topic involved. 

* 

Bob Linn (persb^ial communication) suggested that the diffi-i*U 
culty index (proportion of correct answers) of items should be expected 
to have a perfect negative rank-order correlation with difficulty of 
topic. This is so because, for instance, nobody should be able fully 
to understand the import of the identity matrix without first knowing 
how to multiply matrices. Thus, if item i tests/f or the knowledge of 
matrix multiplication, while item j tests for understanding the concept 
of the identity matrix, it is natural to expect that anyone who got 
item j correct will *&lso have answered item i correctly. 

Now, we sorted our subjects x items score data by item "dii^fi- 
culty 11 (proportion of subjects answering correct) and by total score 
earned by each subject. The result is a plot of dots and blanks in a 
pit tern ^.ike that shown in Figure 9, which resembles a s'calogram. The 
upper left-hand corner represents the score (dot = 1, blank = 0) on 
the easiest item earned by the highest-scoring subject. Then the^ points 
a A which the number of dots to the left equals the total score were con- 
nected' by a "step line, 11 whic|j Sato (1977) called the "S-cutVg." If 
thp data were ptfrf^ct, i.e., if the items were scalable in'Guttman's 
(1947) sense, and the item scores were error-free, then the sum of the 
estimated conditional probabilities p(X- = 1 j = 1) over j = i + 1, 
i + 2, . . .,48 (where and Xj are the scores on items i and j, which 
• are either 1 o? 0) will be given by the shaded area in the figure. 
This value is associated with the relative importance of item i to the 
items testing for moVe 'advanced topics. We related these sums for the 
48 items with the difficulty of the topic being tested and found a 7* 
nearly perfect rank-order matching; only three items were disarrayed. 
(Thanks are due to Bob Linn for' suggesting that we consider th.e condi- 
tional probabilities. It was our idea to sum thenu,) 

4 

It should now be clear^that Difficulty 2 is related to the 
♦sum of the , conditional probabilities complementary to those considered 
above. Hence this alternative "difficulty 11 index is of^interest quite 
apart from the traditional difficulty index. * 

The discriminating power r . in the classical test theory 

sense is shown in the third column, while the correlation r . between 
/ s, ti 

item response time and total test score is given in the last column. - 
•Since the test is a pretest for a difficult* subject that requires high- 
er cognitive- skills and most^f the students were not mathematics or 
physical-science majors, aimers t 65 percent of the items were tough prob- 
lems. The 88 students on tfte basis of wfiom the results in Table 7 wfere- 




3-€5t xcn : : . • . 45 % 




Figure 9 Graphical explanation of difficulty 2 
(Loeschner's facility index)' 



Note. Tile 48 items are ordered by their difficulty index so that the • 
easiesTTtem is placed at the leftmost, and the hardest at the rightmost 
position. The 86 students are also ordered by their scores from the top line 
as the highest, to the bottom as the lowest. A small dot stands for a 
correct answer 1, and a blank is for wrong, 0. The two step lines represent 
the total scores of each student and numbel: of right answers for each item 
respectively. The shaded area represents difficulty2 of item i when a test 
is perfect. 



38 

48. 



computed excluded those who were taking (or who' had completed) the most 

advanced of the three statistics courses using the matrix algebra 

course, and hence the r . values are not very high. It njay be noted 

s , i ^ u 

that r . increases as the difficulty indices get smaller, which is » 
s,ti . J 

reasonable because a person who gets higher test scores tends to stick 
longer with .difficult items while less able students ,giye up on them 
and go on to the next Item. 

Table 8 shows the correlations among the four measures (two 
of which are themselves correlation coefficients) that were displayed 
in Table 7 and were discussed above. It is seen that the two types of 
item difficulty (actually "facility") indices correlate almost perfectly 
with each other. (It'might therefore be argued that the second index, 
Difficulty 2 is gratuitous, but it does have some^ desirable "fceoperties 
discussed above that are not possessed by the traditional difficulty 
index.) On .the other hand the two "discrimination indices 11 are uncor- 
rected with each other, alid instead r . (or rather its Z transform) 

H S , tl 

shows a moderate negative correlation with the "dif f iculty" indices. 

Table 8 

.Correlations Among Two Item Difficulty Indices and 
Two Discriminating Power Measures 







1. 


2. 


3. 


4. 


1. 


Difficulty 1 


1 








2. 


Difficulty 2 


, .'991 


1 




s 


3. 


* 

Z(r .) 


.241 


.175 


1 




4. 


Z(r ) 

s, tl 


-.492 


-,544 


-.018 


1 



*These are Fisher's Z-transforms of the correlation coefficients 
shown in .the parentheses. * 



Recalling from Table 7 that no item actually had negative r . values 

s > ti 

to speak of (only two values were negative, but they were practically 
zero), a low r g ^ value means that students having^high total, scores 
and those with low total scores showed little difference in time taken 



4 § 9 



to respond to that item. Hence, the moderate negative c9rrelations be- 

tween'r . and the "difficulty" indices, just noted, imply the follow- 
s , t i 

ing itelations: It was the easy items (i.e., those with large 
"difficulty" index values) that tended, by and large, to exhibit little 
differences in response time between those with high and low prior 
knowledge of matrix algebra'. Conversely, the more difficult items 
tended to ^how larger differences in response time between high and low 
total score students, with high scoring students tending to take longer 
time. '^We\ may therefore infer that students with higher prior knowledge 
of matrix algebra tended to persevere longer on difficult items while 
those wJLth low prior knowledge tended to give up on them sooner. This 
is a reasonable result, and by itself is almost trite (except that it 
does seem to confer some construct validity to the test) but it has 
some implications for subsequent interpretations of the Weibull shape 
parameter, c. , 

\ The Relation between Discriminating Power and Time . Woodbury 

(1963) and Novic (1966) developed a model involving time tfiat identi- 
fies the measurement process xcdth -the realization of a stochastic pro- 
cess. However, their definition of^tje^tirae parameter, t, is the 
examiner-controlled time allowed for tt\e test, in other words, the o 
length of the test, whereas the time score we have been using is the 
time taken by an examinee as needed* Their studies showed that there 
is some optimum time that* maximizes the reliability of a test. Even 
though their jiefinitipn of time for a test and its relationship with 
test theory are quite different from our usage, we were convinced .that <S? 
by controlling time after the fact in the scoring* process, we could 
demonstrate a similar relation from our (lata between time score and x 
some established concept in test theory. Testing out our hunch, we 
found two interesting empirical relations between time and discriminat- 
ing power. Specifically, x^hen a test item is easy, there is an optimal 
time point within a relatively short time interval such that the dis- 
criminating power of the item becomes the largest. On the ot^er hand, 
for difficult items, the longer the time allowed the betteTr* the discri- . 
minating power. These relations were observed fairly consistently for 
48 items in two samples of about 80 and 100 subjects--i.e . , data from 
the prerevision and postrevision matrix algebra pretests-^and also for ^ 
the posttests for*the lessons on multiplication, matrix inversion, 
transformations, and eigenvalues and eigenvectors. 

Figures 10 through 12 illustrate these relations, while Table 
9 displays the numerical detail on which Figure 10 is based. (Corres- 
ponding tables 'for Figures 11 and 12 are omitted to* save space.) To 4 



50 



f 



1.0 

v ■ 

— -. 

Q- » 

GO 

4-' _ 

ilj — 



* ui 



0.0 

5 s |* 



40 60 
% of subjects 



30 



100 



figure 10 Discriminating powers over 10 cutoff times (in seconds) 
\ " for OK subgroup 



'ISable 9 

10 Points in Figure 12, item 20* 



%* 


cut time 


N 




10 \ 


20 


4 


0.013 


20 


2 9 """ 


3 


v 0. 139 


3B 




12 


' 0.272 


40 


37 


J 6 


* .0.4.56 


50 


42 


20 


- 0.497 


6JP * 


54- 


2 4 


0.437 


7 J' 


67 


23 


' ; 0.432 


80 


32 




0.311 


90 


1 10 


i j 


0.145 


1 0f" 




.40 


x 0*000 



* % of .subjects, ** discriminating power 



ERIC : 



51 



o 
II) 

Q 



0.0 



.0 



20 
x 



4<3 ' 



60T 



30 



of sub j -sets 



Figure 11 Discriminating powers over 10 cutoff tines (in seconds) 
for all subjects The item is question 20 which is the 
same item in Figure 10. 



1.0 T 



U 
01 
3 
0 

G. 

m 
c 

• i-i 
4-' 

ilj 

■ rH 

£ 

-rH 

u. 

0 

w 

• H 

Q 



0. 5 



0.0 







X 

X 

X 

X 


X 

x x 


* 1 . 1 u_h 1_ 


— 1 « 1 


.0 2J0T 40 60 - 

X 

A- % of subj ects 


,80 100 



Figure 12 Discriminating pofers over 10 cutoff times (in seconds) 
for all subjects. The item is question 16, 

CO 



c 



explain the .construction of Figure 10 and the contents of Table 9, the 
40 subjects in the OK subgroup for Item 20 were first arranged in as- 
cending order of their times taken. The fastest 10 percent (n = 4) v 
of the subjects, with response times no greater than 20 seconds, were 
taken and only \hese subjects were regarded as having answered Item 20 
correctly. The Item-20 score and the total test score for 36 subjects 
were thus modified, with the scores (1 or 0) on the other items left 
unchanged for all 4Q subjects. The point biserial correlation calcu- 
lated between the modified Item-20 score and the modified total sdore 
for the 40 subjects, is what is shown as the first entry, .018, in the 
last column ("adjusted discriminating power 11 ) of Table 9 and is the 
ordinate of the first point plotted in Figure 10. Next the fastest 
20 percent, with response times- no greater than 29 seconds, were scored 
1, and the others scored 0, v on Item 20 and the total scores accordingly 
modified. The point biserial thus calculated is ,the second entry, 
.139, of, the last column in Table 9 and is ordinate of the second point 
in Figure 10. The same process is repeated for Vhe remaining cutoff 
percentages*, 30, 40, . . ., 90 percent, yielding adjusted discriminat- 
ing powers .272, .456, . . ., .145, respectively. The last cutoff 
percentage (100 percent) necessarily yields a point biserial value of 
zero, because all 40 subjects now arar scored 1 on Item 20, since only 
the OK subgroup was used. For , this subgroup Item 20 wag obviously an 
extremely easy item (everyone got it right), and the maximum adjusted 
discriminating power .497 occurs when the cutoff percentage is 50 per- 
cent, with a cutoff time 42 seconds, thus illusti^ting the time vs. v * 
discriminating power relation stated above for an easy item. 

Figure 11 shows the time vs. , discriminating power relation 
for the same item, but now using^tke total sample of 74 ."earnest" 
subjects (i.e., those who chose an option at all for Item 20). Of 
course not all of the fastest 10 percent (n = 7) were. scored 1 on Item 
20 this time but only those among the seven who actually got the item 
right were /SO score'd. Similar scoring was used for the fastest. 20 per- 
cent , fastest 30 percent, etc.* through the entire group. Item 20 is now 
a moderately difficult ifem, with 40 out of 74 subjects getting it * 
right, and the maximum discriminating power, .597, now occurs at cutoff 
percentage 80 percent with cutoff time 70 seconds. 

v Figure 12 presents an exception to the rule. .Item 16 was an' 

eS|y item (73 percent got it right, as shown in Table 7), and yet the 
maximum discriminating power, .462, occurs with 100 percent cutoff. 
Thus, the empirical generalization stated earlier is not a perfect one, 
suggesting that other factors besides* item difficulty must affect the 
relation 'between discriminating, power and time. Theoretical work on ' 
this issue is planned. 



43 

' fr 53 



ERIC 



0 

\ 



• 



5.4 Interpretation of Weibull Parameters v * 

We sa^vin Section 5.2 that the response-time data for practic- ^ 
ally all of the 48, matrix-algebra pretest 'items were well fitted, and " ^ ^ 
those f6r a large majority of them were excellently fitted, by Weibull ^ 
distributions. Ity^is now time to engage in some interpretations of the, 
observed fit. Th£ first tiling to note is that the Weibull distribution 
for an iyfSnS^iL-the OK subgroup (those who got^the item correct)* and that in 
the NO ^ubgroup showed interesting differences. This 'is apparent from 
a Comparison of Tables 5 and 6, given earlier. Let us now focu$ on a 
couple of specific items and, compare their Weibull" parameters for the 
twc/ subgroups. 

For example, Item 10, which asks for the transpose of a 2 x 2 
natrix (see Appendix B) , shows quite a contrast between the two sets of 
Weibull parameters-?, The OK subgroupThas larger values for all three 
basic parameters than does the NO subgroup: 

o s o 

OK subgroup „ 3.52 1.33 29.82 30.73 

NO subgrotip^ 0.00 .9,2 * 20.69 21.50 

^Here y is the theoretical mean, denoted earlier by E(t) and related to 
the three basic parameters through equation (2.7): 

y = t + y r(l+l/c) . 
o o 

Similarly, Item 16 (finding the product of a (2x3) and a^3x2) matrix) 
has Weibull paramej^rs as follows: y >. 

t c y y 

O o 

OK subgroup * 5.47* 1.16 38.47 3^.94 

NO subgroup .92 .38 19.33/ £0.85 



l/tv* e 



Since t is the theoretical minimum time required/f tfc' exam- 
inees to arrive at their answer, it is only natural that tie NO subgroup 
had the smaller value fftf^Jboth items* Most members of this subgroup 
simply pressed the NEX>T key or made an incorrect guess. They usually 
don't know^whaj: the transpose of a matrix is or how to multiply mat- 
rices. They may have h^ad some exposure to the rudiments of matrix alge- 
'bra college algebra course a,long time ago, but since they had no 



4* 

r 54 




further contact with matrices they have fotgotten what little they knew. 

Thus it seems safe to infer that anyone whose response tim£ to an item 

is closer to the t value for the NO subgroup than to the t for the OK 

0 ' o 

subgroup must have guessed at .the answer instead of attempting to solve 

the problem. However, this is still in the realm of speculation, and 

we will examine the issue further in the context of posttest data/ 

Thirty-seven out of the 48 items have 'larger values of c (the 
shape parameter) in. the OK subgroup than in the NO subgroup, but the 
opposite is true fpr 11 items. Six of these 11 items were of the true-' 
false type, and two A nvolved either an ambiguity of wording or an 'incon- 
spicuous symbol (' for the transpose of a matrix). Thus, a majority of 
the items for which the c value in the NO subgroups was larger than in 
the OK subgroup had, something unusual, about them. 

Returning to the two items cited above, both were among the 37 
"normal" items for which the OK subgroup had the larger^Value of c. 

'Looking at the y values for the two items, we can infer that Item 10 
was easier than Item 16. (In fact Table 7 shows that the "difficulty 
index" — which should be called the "ease index" — had the values .802 and 

„.733 for the two items, respectively.) The difficulty index and c are 
indeed positively correlated, atJLeast for the' OK subgroup. 

Table 10, in the next subsection shows that the c for the OK 
subgroup correlates (across, the 48 'items) .41 with the difficulty index 3 
as computed for the total sample, while the c for the NO ^subgroup cor- 
relates -.15. These are not' exactly high correlations, but when the c 
is based on the total sample (not shown in Table 10) the correlation 
increases to .56. If the c'from the NO subgroup is partialled out, r 
the partial correlation between c in the total sample and the difficulty 
index is .70. Since the time-score distribution in the total sample 
does not fit the Weibull distribution as well as those in the OK and NO 
'subgroups separately, however, the parameter c based on the total sample 
may not be very meaningful. We may have to introduce a composite 
Weibull' distribution (cf. Mann, Schafer and Sigpurwalla, 1974, pp. 140- 
142) to u fit the total sample, but we have npt ; d^ne so in' the present 
•study. W >■ % '\ 

) . • 

The upshot of the foregoing discussions^is that the shape para 
meter, c has something to do with item difficulty, but not so much as to 
be identified with it. In a sense, c has a "richer" meaning than .the 



3 Unless the suffix '2 1 is, attached, "dif f iculty (!:: ±Tffex'' will always 
denote the traditional difficulty index* and not the alternative index 
introduced by Loeschner. tm 



45 

55' 




usual concept of difficulty, since it determines the shape of the cumu- 
lative distribution curve. The nature of its relationship with the 
distribution shape is illustrated in Figure 13, which depicts the 

Weibull distribution with t = 10, y - 30, c = 1.5 (curve 1) and that 

o o 

with t = 10, y = 30, c = .8 (curve 2). Curve 1 is seen to approach 
o^o* f 

its asymptote more rapidly than curve 2 does.** Although, by defini- 
tion, the graph of any distribution function must asymptote to F(«) = 1, 
it may approach different values within "reasonable* 1 ranges "of the time 

variable, thus indirectly reflecting different item difficulty levels. 

» 

To further illustrate how c determines distribution shape 
with real item data, we again return to Item, 16. Figure 14 -shows the 
distribution curves for J tern 16 in both the OK subgroup (curve 1) and 
the NO subgroup (curve 2)% Curve 1 starts at t Q = 5.47 on the time axis' 
and converges* to 1 faster: than does Curve 2. It is interesting to 
note that about 40 percent of the NO-subgroup examinees leave this item 
before the (theoretical) minimum time, -5.47 seconds, for the OK sub- 
group. Ten percent of the NO-subgroup examinees spent too long a time 
without achieving success "while almost all subjects ir^ the OK subgroup' 
arrived at the answer in 130 seconds. These facts suggest that it is 
not necessary to allow more than 130 seconds for people to answer Item 
16. The density-function curves for both subgroups are also shown in 
Figure 14, but their scale is different from that of the distribution- 
function curves. The unimodality when c > 1 and absence of a mode when 
c £ 1, alluded to in Section 2.2, is here seen for fits to real data. 

. The conditional response rate (CRR) t which formed the theore- 
tical basis for deriving the Weibull distribution in ^ection 2.1, is 
here given for real data, that for Item 16 again. Curve 1 in Figure 15 
SI19WS the CRR for the OK subgroup,, with c = 1.16, and r Curve 2 that for 
the NO subgroup with c = .38. Curve 1 increases monotonically with 
time, indicating (loosely) that* the longer a person sticks with Item 16 
the more likely it becomes that, he/she will get it right if indeed he/ 
she gets it right at all. . Curve 2, on the other hand, decreases rapidly 
with time. Among people who do not get Item 16 right, the longer they 
stick with it, the less likely it becomes that they will -respond to It 
the next instant, given that they haven't responded to it so far. Hi 
other words, many subjects gave the wrong answer early on but the "giving 
up 11 rate slows .down &s time goes by. It might be said that CRR expresses 
the degree of involvement in an item by examinees. But further expli- 
cation of this concept must await further research. & 



''in .fact, it was^this relationship with the speed of " convergence " 
to asym^to.te that led us to denote the shape parameter as c. 



\ 



46 

56 



* okonly 
« noonly 



c 

1.161 
. 0. 3764 



tf < 
5. 4740 

0.9206/ 



ji 0 mean 
38.47 36.51 
19.33 76.68 



2.0 T 



1.5 



1.0 -■ 



0.5 - 



question number 16 




0 20 " 60 . 80 100 120 140 

time in seconds 

* r vertical scale for f ft) is magnified by a factor of ;^ 



Figure 14 Weibull dis'triborXcm and density function 



ERIC 



= ekonl \> 
= hconl v 



1.161 
0. 3764 



5 . 4TM0 
0. 920 



• y 0 
IS. 4^ 

19.3 3 



0.050 T 



0.025 -- 



e .Beer 



question number 16 




+ 



+ 



4- 



> 



— I H 1 — = — i 

0 10 2(3 30 40 50 60 70 30 90 i $ n 

time in seconds 



Figure 15 Conditional response* rate function F( t} / ( i -F(t ) ) 



ER?C 



59 



Next, x*e Idejjtified f^ive \ets (four pairs and one triplet) 
of items that respectively had the same difficulty index in the tradi- 
tional sense and were on the difficult side. The c-values (determined 
for the OK group) of the items within the "istfdif f iculty 11 sets were con- 
sistently and fairly substantially different. Examination of the item 
contents revealed that the c-values t seemed to reflect a more intuitively 
plausible notion of ""item difficulty 11 * than did the traditio"nal diffi- 
culty index. The data are as follows: <_ 



Difficulty Index Item cj-value 

(Proportion Passing) Number (OK group) Item Content 



.279 
.279 
.114 
.314 



38 1.062 A property of orthogonal trans- 

¥ formations A 

43 .982 Variances and eigenvalues 

12 1.336 Triqky problem on order of 

matrices 

37 1.019 Property of orthogonal trans- 

formations 



.290 

, .290 
.290 



17 1.173 Symbols for matrix /vector oper- 

ations 

23 1.014 If AB*= AC then B = C 

33 1 . 240 Matrix. inverse : numer ical 

example 



.349 
.349 



35 1.917 Orthogonal transf ormatioi/T' 

numerical example * 

39 1.276 Simple property of orthogonal 

transformations 



.442 
.442 



21 1.126 Row-wise expansion of deter- 

minants* 

36 ^784 ( , Property of orthogonal trans- 

formation's 



The foregoing data> suggest that the Weibull parameter c Wy^be a more 
sensitive measure of the conceptual -difficulty of an itemEhan is the 
traditional difficulty index defined as the proportion of examinees 
getting the item right. In fact, for the OK group the items are com- 
pletely undif ferentiable by the traditional difficulty index, since 
the value is 1.00 for every item. Yet c enables us to differentiate 
among stach items by detecting different rates at »which the asymptote • 
is approached * \ 

Items 36 through 39 all ask the 'simple properties of orthogonal 
transformations. Their difficulty indices are .442, .314, .279 and' 
.349, respectively, but their c values increase monotonieally in the 
order the items were presented: .784, 1.019, 1.062, 1.276. This makes 

- 1 9 * 

. * 50 



sense when we consider the concept of, or the rerfSoning behind, the 
Weibull parameter c* in general. The CRR for questions related to the 
same topic seems to increase as the familiarity with the topic increases, 
as it should from earlier to later items on the same topic. Thus, the 
parameter c, seems to be related to what may be^ termed degree of involve- 
ment on the one hand and degree of familiarity on the other. Berth ' * 
these are indirectly related to difficulty but are conceptually dif- 
ferent f»rom it. * * ~ ; 

To conclude, the means, across the 48 items, of the three Weibull. 
parameters in the two subgroups were as follows: 

t (seek) * c y 

-o » o 

OK subgroup 2.7 1.125 33.05 * 

NO subgroup 1.1 ^ .903 22.50 

We did not discuss the scale parameter y^ in the foregoing, but in view 

of its mathematical relation, y = t + y r(l+l/c), with -the theoretical 

o o 

mean of the distribution, ^it hardly needs discussion. Since the mean 
of y^ i^» smaller in the NO subgroup thanM^^jhe^DlC subgroup, we may con- 
clude that, on the average, the NO subg^up spent less ^ime per item 
than -did the OK. subgroup in the matrix algebra pretest.. 



ERIC 



5.5 .Correlations among Weibull Parameters and Item Statistics 

TJie foregoing concludes the main aiylyses carried out on the 
data from the original version of the matrix algebra pretest. To ex- 
plore other possible relations,, however, the three Weibull parameters 

and the maximum correlation between £n£n(l-P) * and £n(t-t > that was 

o 

found in the process of estimating the parameters (see Section 3\1) for 
■** the OK subgroup and the NO subgroup separately were correlated witi the 
Kolnogorov-Smirnov p-*val'ues in the OK subgroup and six other item stat- 
istics based on the total-sample. There were thus 15 variables in all 
but the y in the NO subgroup had to be omitted because of storage limi 

w O . * 

tations. The resulting "14 x 14 correlation matrix is shown in Table 10, 
where the correlations significant at the 5 percent level are asterisked. 
(All correlation coefficients used as input variables were transformed 
into Fisher's Z beforp being correlated with other variables.) 



• - " 61 

» . 51 



Tabic 10 

A Correlation Matrix of Ucibull Parameters and It en Statistics. 



1:2 3 4 5 6 7 8 9 ' 10 11 12 13 14 

1. t 0 (OK sub-group) 1.000 

2* max. correlation .067 1.000 1 

3. c , -.169 .185 1.000 

4* u 0 OS5* -.184 .245 1.000 

5. No. of options -.066 -.179 .316* .438* 1.000 

6. o from K0I30. .095 . 143 -.331* .033 .023 1.000 _ , 

7. Difficulty 1 -.007 .^55 .406* -. 137 . 113 -.61 1* 1 .000 ^ 

8. r s>1 - -.029 .109 . 160 . 100 -.049 -.216 .241 1.000" 

9. r s>tl x -.210 . 124 .097 .340* -.066 .271 -.492* -.018 1.01)0 

10. Difficulty- 2 .016 . Dl .374* -. 169 . 163 -.582* .991* . 175 -.544*. 1.000 

11. Average tine* .131 -.149 .183 .912* .541* -.015 -.004 .089 .219 -.030 1.000 * . 

12. t 0 (NO sub-group) .016 -.048 -.003 . 143 .080 -. 167 . 187 .009 -.189 .188 .259 1.000 

13. max? correlation .088 .000 .091 .117 -.108 .140 -.243 -.021 .222 -.262 .083 -.004 1.000 
44.-<r~ .240 -.031 -.170 -.189 -..221 -.024 '-. 153 -.232 -.133 , -.136 -.157 -.108 . 186 1.000 

. *Significant at £ < .05; i.e., r - - .285. N « 48 items. 

Note. All correlations were converted by Fisher's Z-transformation. • The first 4 variables are 

of OK subgroup, the last 3 variables are of X0 subgroup. 



• 0,1 



ERIC 



1l 

The Weibull parameter c of the OK subgroup correlates with 
.the numbers of options in the items (.316), difficulty 1 (.406), the 
arcsine transform of the Kolmogorov-Smirnov p-values (-.331), and dif- 
ficulty 2 (.374)* The negative correlation with the Kolmogorov-Smirnov 
p-value is probably largely an artifact, because items with small values 
of c tend to be more difficult and hence the QK subgroup 'for those items 
tend to be smaller , 1 thus inflating the "significance 11 and decreasing 
the p-values* When the difficulty index is partiallejd out, the correla- 
tion between, c and p drops to -.115, which is nonsignificant. 

> * 

The shape parameter in the OK group also correlates .316 with 
the number, of options, meaning that items^with more choice' options 
(which ranged from two f or f true-false items to five for the multiple- 
choice item 'with the largest number of alternatives) tended to have * 
larger c values. If>Our interpretation, suggested earlier,* that the 
itenhx-yalue reflects degree of engagement students show with the item 
is correct, we may conclude that within the range represented, the lar- 
ger the number of options the greater the engagement students feel. 
This seems reasonable aince items with more alternatives present more of 
a cognitive task and hence probably induce greater involvement on the 
part of students. To put it the other way around, -this observation 
lends further support to -our not?on th£t c reflects degree of engage- 
ment. It should be mentioned that par tialling^out^ Variable 7 (diffi- 
culty 1) does not affect the correlation (the partial r ' is ..30)7 hence 
the correlation between' c and number of options cannot be explained 
away by arguing that the larger the number of options the nfore diffi- 
cult the item tends to be. ; 

✓ * 
It is interesting, although rather .disappointing, to note 

that none of ^ie Weibull parameters correlates with item discriminating 

power ' r s ,i- r-. V 

Next, the scale factor corr^tes moderately to highly 
with the number of options (.488), r g ti (.340) and the observed average 
time (.912). Since M q is functionally related to y (the theoretical 
observed time — see" equation (2.7)) its very high correlation with ob- 
served average time. is alsp to be expected. This in turn explains the 
moderate correlation between M q and number of options, since the larger 
the latter is th£ more time it would take, by and large, to respond to 
that item. / 

Tk£~T^values from the Kolmogorov-Smirnov tests of goodness-of- r 
fit in the OK subgroup have correlations -.611 £ind -.582 with difficulty 
1 and difficulty 2, respectively. This, too, is probably an artifact 
to a large- extent in that the p-values tend to decrease as sample size 

i 

i 

63 

53 



increases, and large n for the OK subgroup means an easy item whose 
"difficulty' 1 indices would be large. 

The correlation r . between item response time and total 
s,ti „ 

test score in the total sample of 88 subjects correlates with y (.340), 

difficulty l(-.492) and difficulty 2 (-.544). The negative correlation 

with the difficulty index has already been discussed in the previous 

\ * 

subsection. The positive correlation with the scale factor y is dif- 
r o 

ficult to interpret. 

It should be noted that the Weibull parameters, t , r and 

o max 

c, from the NO subgroup did not correlate significantly with any of the 

c 

other 11 variables. This is probably because the data analyzed hei^e is 
for the pretest,* and members of the NO subgroup know little if any mat- 
rix algebra. The situation changes considerably when we analyse the 
posttest data in, the next section. s # 



* \ 



64 



ERIC 



6. ANALYSIS OF POST-REVISION DATA 



. After -the results from the original version of the matrix 
algebra pretest were analyzed, and partly as a consequence of the analy- 
ses, several items were modified to correct ambiguities in* wording or 
defects in the display. Another change made in the .test was, as men- 
tioned earlier, that the option of pressing the NEXT keytfto go to the 
next item without answering the previous one .was eliminated. This change 
was made at the request of the instructor 1 of one of the participating* 
statistics courses who wanted to force the* students to answer all. ques- 
tions. Ipr retrospect, however, this may have been a change for the worse, 
for it has no doubt led to increased ^^essing. At the risk of seeming 
to attribute tp the Weibull distribution somer magical power to detect 
"undesirable 11 items, we note that the fit became very poor for items "in 
which a large increase of. guessing must have occurred such as*, those 
testing for difficult material like transformations. It could >e that 
guessing contaminates the distribution so that it no longer appears to 
result from a single underlying stochastic process. * How to handle this * 
problem is something we are not prepared to say at; this 'time. That must 
be left to future research. 

In this section we discuss the^analyses rvpt* # only of data from 
the revised pretest hut those* from the pbsttest as. well. (In fact-, the 
analyses are mostly of the posttest data.) We must therefore first " 
describe those tests. & ' , ■/ 

6 . 1 Description of Posttests (with Some Speculations ) 
b * w , , , 

It might have seemed strange that the analyses discussed so - 
far were confined to* pretest data, with no mention of a posttest-. This 
was simply because no posttest had existed before Fall 1976. Thanks ^to 
NIE funding, we were able to implement four posttdsts during that semes- 
ter. (Lest it -be thought that funds were diverted from research to 
instructional use — especially in the current atmosphere of censure of 
mishandling of grant monies! — it should be.ppinted out that the posttest 
results were not used at all in detei^ning grades in the three statisr 
tics courses*./ Thus, these tests* served our teseairch purposes only.) 

Specifically, the tests .come after completion of the lessons 
on matrix multiplication, on matrix inversion, on transf ormat<ujns , and 
on eigenvalue problems, and they aie referred to as "Multpost, "Vlatinv- 
test," "Transtest," and "Eigtest," respectively. 'The items on* each . 
constitute a subset of the %8 items un the pretest, there being 23 
items in Multpost, 12 in Matinvtest, 7 in Traitetest, and 8 in Eigtest. 
(The numbers total 50, because the first two^tests contain two items^ in 
• common. X The actual items,* identified by wjhich numbers they are in the 
pretest, are shown on the following page along with the number of subject^ 



on whom usable data for each test were available. (The interested 
reader may refer to Appendix B to see what the items on each posttest 
are, ) 

Number of 
Items Subjects 

Multpost 1-18; 25-29 68* 

Matinvtest 17-24; 30-33 30 

Transtesf 34-40 4 38 • 

Eigtest 41-48 56 



We mention in passing that many students complained, in their responses 
to an open-ended question included in .the questionnaire attached to the 
lesson, that some of the items in Transtest tested for material beyond 
.what was taught in the lesson on transf ormations. They claimed that 
these itej| were too mathematical and advanced for the student^ to whom 
the t£s/t ^6 addressed. Despite these complaints, however, ,we did not 
modify these items* because we were curious to see whether /the Weibull 
fitting would be adversely affected by the lesson-unrel^tedness of the 
items. * / 

Three Items with Matrices Constructed by a Random Number 
Generator . In three of the items, the elements of the matrix # in the 
stem were generated by a random number generator, so students would not 
get identical matrices to work with. Each h#d a parallel, counterpart 
item in which the elements of the matrix were fixed. We were curious 
to see whether the two types of item would lead to equal degrees of fit 
to the 'Weibull. If not, it would indicate that the time taken for the 
sheer arithmetical calculations, which may vary from version to .version 
of the randomly generated items, plays an important part in .the total 
time taken for the item, thus leading to a morfe complicated distribu- 
tion with separate Weibull processes for the arithmetic and the matrix 
algebra parts. A twofold Weibull convolution might then offer a "better 
fit to the "random" items while a regular Weibull would fit the "fixed" 
items. Alternatively, if the component parts are successfully modeled 
by one- (tvo-)parameter negative exponential distributions, then both 
the fixed and random items would be well fitted by two- (three-)paramete 
gamma distributions with the random items having a value for c greater 
by approximately 1 than the fixed items. 

Posttests Should Majce Students More Seriously Involved in 
Solving Items . Since the conditions under wffich pretests and posttests 
are taken differ considerably, we expect th^t the incidence of guessing 
will differ in the "two ceases. Specifically, .we-expect that guessing 
will be minimized in the posttests while the press to guess would be 
greater in the pretest, especially Mien the option to skip items is 
eliminated. Also, the knowledge of matrix- algebra newly acquired 
after the students have gone through the lessons will have led them 



56 

66. 



to be more involved, with answeting the test questions. Consequently, 
the CRR should be monQtiniQally> increasing with time rather than 
decreasing or remaining constant. We would therefpre expect c to be 
greater than 1 fox posttest items . On the other hand, if the jstudent 
doesn't know or has forgotrten the material, we would expect him/her 
rnoro, likely to give answers by random guessing. 

6.2 Results pf Analyses 



Tables showing p-values and z-values from the Kolmogorov- 
Smirnov tests of goodness-of-f it and the Weibull-parameter values for 
the items-in the four posttests are given in Appendix E. Here we dis- 
cuss only the summary results and their implications. Table 11 shows 
the percentages of items having Kolmogorov-Smirnov p-values greater 
than .20, greater than .40 and greater than .50 and the* me'an p-values 
for the four posttests in the two subgroups. Also shown for. compara- 
tive purposes are the mean p-values in the pretest. 



Table 11 

Percentages of Items with Ko'lmogorov-Smirnov p-values 
Exceeding Three Values in the Two Subgroups 







p > .20 


p > .40 


7 










p > .50' 


P 


Pretest p 


OK 


Subgroup\ 










V 




Mult post 


87% 


83% - 


78% 




.46 




Matinvtest 


100%' 


92% 


75% 


.64 


.55 




Transtest 


71% 


57% 


43% 


.50 


.51 




Eigtest 


100% 


100% 


100% 


.79 


' .78 


NO 


Subgroup 








* 


« 




Muitpost 


100% 


100% 


100%' 


.82 






Matinvtest* 










.50- 




Transtest 


71% 


71% 


57% 


.54 


.43 




Eigtest 


100% 


100% 


88% 


.80 


.27 




Il*5uf f icient data 












" The. distributions 


of time- 


score data 


from all 


but the Trans 



test fitted the Weibull distribution much better than did those of the 
revised, pretest , for which only yV) porcent-of the items had p-value^s 




1 



in excess of ,20 v in the^OK subgroup- and 75'percent in the NO subgroup. 

?The items in the pretest corresponding to those in the* Transtest had 
an average p-value of .51 in the OK subgroup and .43 in the NO Sub 7 
group, as shown in Table 11. The average p-values in the Transtest 
in the two groups, on r the other* hand, were , ,50 and ,54. As we mentioned 
earlier~there were some items in the TfanstestT that covered material 
not taught ^in the lesson, inviting much 'student complairdP. Again, N 

•something unusual in the items seems' to result in poorer Weibull fit. 
Apart from this, the mismatch between the lesson^and the test should 
play a role in the study of the importance of the linkage between lesson 
and items in measuring the effectiveness of instructidfajas well as in 
assessing. the level of a student's learning.' Such a g^udy i$ planned 
in a forthcoming project. 

Let us examine in some detail the results for the problematic 
test, Trans'test, which included items covering material not taught in 
the lesson. (It may be mentioned in passing- that some" students* expressed 
irritation and hostility, while others thought they had missed some- 
thing in the lesson and went back to repeat** it,) For -comparative pur- 
poses, the averages of the Weibull parameters t 0 and c for items, in 
three posttests, with adequate numbers of subjects in both the OK and 
the NO subgroups are shown below. 



MultDost ° K ^ ub 8 rou P 4,98 1.11 (based on 8 out 

for N 0 ) 



• 


t 

0 




c 


OK^subgroup 


4,98 


1 


.11 


NO subgroup 


9.45 


1 


.06 


OK subgroup 


6.83 


1 


.04 


NO subgroup 


5 . 56 v 




.88 


OK subgroup 


7.84 


1 


.22 


NO subgroup 


6.09 


1 


.28 



Transtest ™ . (7 items) 

NO subgroup 5,56 .88 , v items; 

Eietest OK subgroup 7.84 1.22 , 

ElgteSt R11 Wmm A HQ 1 Ofi < 8 Items) 



In brief, the Transtest averages alone out of the thrjee 
post£est averages shown above exhibit a pattern typical of that for a 
pretest item, which was exemplified by Item 16 in the previous section 
(see page 44). That is, the NO subgroup has smaller values for both 
t 0 and c than does the OK* subgroup. In particular, the c value in 
the NO subgroup is smaller than 1 while. that in the OK subgroup is 
greater than 1. (No'te that in neither of the other two posttests do 
the average t 0 and the average c show all of thes* relations between 
OK and NO subgroups.) Thus we may conclude that the Transtest, 
although a posttest, acted much like, a pretest by« virtue of the Anoma- 
lous items. Again*, there is corroboration of our speculation that t 
indicates extent^of engagement or involvement on the part of students 
in an item.. The fact that the average c for Transtest in the NO sub- 
group is only ^88 suggests ttrat many students merely guessed at the 

\ 



&8 



ERLC 



L "5 



answers for the three items in this test that covered material beyond the 
scope of the lesson,' which is only to be expected. - - 

As mentioned earlier three items, used matrices whose elements' 
were chosen by a random number generator ,swapplyih£ integers between 
-9 and 9 inclusive. Specifically, Item 1 asked for the* sum of two 
3x3 matrices with fixed elements; Item 2 was a parallel item with 
random elements. Items 3 ari3 4 called for the difference between two 
3x3 matrices with fixed and random elements, respectively. * Items 5 
and 6 asked for the transpose of a 3 x 3 matrix, again with fixed and 
random elements respectively. That is, the odd-numbered items, used 
fixed matrices while the even-rnumbered items used random matrices. • . « 

Table 12 shows the Kolmogorov-Smirnov p-values and the Weibull 
parameters c and u Q for these three pairs of items in the revised pre- 
test and in the p"os^^st for the OK and ND subgroups (except that the 
latter subgroup is nonexistent {or the posttTest because everyone got 
all six items correct). Note that for the OK s'ubgroup the even-numbered 
items have considerably larger ,c values than do their odd-numbered * 
counterparty in the revised pretest, while the reverse is true for the 
NO subgroup. Interpretations will be attempted after the findings* have 
been stated factually'. w * • 



Table 12 • 1 , 

Comparison of Items Using Fixed Matrices (Odd-numbered) 
and Parallel Items (Even-numbered) Using; Random / 
Matrices in Terms of Kolmogorov-Smirnov p-values 
and' the Weibull Parameters c and y 0 



t>K^ Subgroup 
* — 


* -NO Subgroup 




Pretest * " ' ■ Posttest^ 


Pretest 




i Item p ; c y 0 , P c y 0 


P '* c a po 





i 


.•06 


1. 


'15 


44 


5 


.,33 


1 


06 


' 25 


.3 


.99 


2.08 


52.6 


2 


.02 1 


1. 


92 


28 


1 


.26' 


2 


.76 


23 


.6 


.,99 




3-2 .'l, 


3" 


\. .64, 


■1. 


24 ' 


23. 


3 


.'94 


1 


24 


17 


.7 


' ..41 


1.05.. 


.19.3. 


4- 


.•51 


2. 


19 


32 


4 


' .17 


• 1. 


05 . 


20 


.9 


.66 


. 1-02 


14.5' 


. 5 


.45 


1. 


25 


13. 


J 


.38 


i. 


00 


10 


.'9 


.99 




17:1 


6 


.16 


1. 


35 


12. 


7 


•Q8 . 


A- 


41. 


ii 


6 

Lm 


'..42 


' :;85 - 


10.8 



59 

69 



Next the conditional response rates (CRR) pf Items 1 and 2 
axe compared in Figure 16 for 'the OK subgroup and in Figure 17 for the 
NO subgroup. In the OK subgroup it is Item 2, which uses the random 
matrix, that has a larger and monotonfcally increasing £RR while the 
CRR for Item 1 is smaller and almost parallel to the horizontal axis. 
In other words the ' conditional probability (given that tfhe item hasn't 
been answered up to then) that.it will be answered the next moment is 
always greater for Item 2 than it ijs for Item 1. Note that, on the post- 

" test, both items were answered correctly by all subjects and hence . • 
traditional item -analysis based on the performance score would fail to 
show any difference between these two items that are identical in frame- 
work but differ in the way the matri^c elements were chosen, * Our analysis 
base^ on time scores* has revealed an interesting difference as the 

- c-values in Table 12 shows (althoughj/th& Items' 3 and 4 pair is an 
exception)* " 

« * f v 

Ihg CRRsJof Items 2, 4 and 6 were larger than those of Items 
1, 3 and 5, respectively, in the OK subgroup on the .revised' pretest^ 
but the reverse was true in the NO subgroup. ,That is to say,* the condi- • 
tional probability that Item 1 will be answered wrong in the n^xt 
moment increased with time while the same conditional probability?*- 
decreased for Item 2. Similar remarks hold for the Items 3 and 4 and ' 
Items 5 -and '6 pairs. Figures for these pairs in_both subgroups are 

, shown in Appendix F. •» . 

» • » 
« Interpretations ,. The complicated findings* reported above are 

difficult to interpret, and what follows must be regarded as attempts 

.rather than definitive .interpretations. - The three pairs of items were 
very simple for students taking advanced statistics course* in education 
and psychology once they learned the definitions of 'matrix addition, 
Subtraction and transposition, ^he .only difficulty probably occurred 
in the addition and (more so) 'the subtraction o^ signed numbers, and 
in the case of transposition, the^ requirement 'to .-rapidly .perceive and * 

' distinguish among five 3x3 matrices (the options) with the same sat' 
of numbers different arrangements. We surmise that Xhc items wi-th 
fixed matrices (Items 1, 3 and 5)^were so easy for the OK subgroup 
that the answers were arrived 'at without much "involvement 11 on the part 

~of the students. But sortie versions of 'items' 2\\ 4* and 6 probably re- 
quired greater -attention and involvement of the students, depending on 
particular combinations o£ numbers chosen by, the random number generator. 



'Let 'us now return to Table 12 and examine the WeibulTpara- 
'meters f or /the .'post test/ 1ft -itfem pairs 1, 2 and 5, 6 they foil Sued 
exactly the same pattern as they^did for the pretest. In fact, the 
difference in c values between the odd- and even-numbered members of«< 
the pairs were greatdr fqv^ the -post test 'than t^ey were for the pretests, 
Referring to equation (2.4) for the tyeibull CRR functionV it can be , 
^seen that' for fixed c, this is a m^otonically^decreasin^Cunction of 
u 0 . Since all \i Q % s for the postwst w$re smaller tharNthe ^corres- 
ponding j^'s for the pretest, the CRR values at any given * * 



■ ) 

/ • 70 



60 



item 
item 

0.180 T 



1 

2 



0.090. 



0.000 



0 



c 

2.07.9 
0.4865 



x 0 
0.0001 

1 1.4871 



^0 
52.64 

32.09 



10 



OK subgroup 



+ 



4- 



20 30 
time in seconds 



40 



2 



51 



Figure 16 Comparison of conditional response rates of 
items 1 and 2 for OK subgroup 



- item- 
item 

0.117 T 



0.058 -- 



0.000 



c 

1 : 1.150 

2 : 1.924 



NO subgroup 



• x 0 
10.5903 

2.5260' 



H 
44.51 

28. 12 



0 



10 20 

/> ' ' tiirie in 



4- 



30 

seconds 



40 



51 



Figure "17 Comparison of conditional response rates of 
^ items 1 and 2 for NO subgroup - * 



y 



71 



time point were* mostly larger for the posttesfc than they were for the 
pretest* In item pair 3, 4 the pattern w&S not the same for the' post- 
test as itj^as for the pretest* „ 

For members ^of the NO subgroup, the' materials of these "phree 
item pairs, were relatively new and, unknown or were forgffetten. As time 
went by thW CRR T s slowly increased for the fixed-number items 1, 3 and 
5, while this was not the case for Items 2, 4 and 6. (See figures 'in 
Appendix )?.. ) " This means that the students who answered- the f ixed- " 
number items wrong tried harde'r, on the a^rage, to figure out the right 
.answers than , did the students who answered the random-number items* 
Two of the latter items (Nos. 2* and ,6) were giveh up quickly on the 
average, again suggesting that some number combinations .chosen *by the 
random-number generator led to difficulties. In the exceptional piair, 
Items 3 and 4, the c values were almost- equal and close to one (1.05 
and 1.02) but the po's were somewhat different (19.3 and 14.5). 'There- 
fore the CRR curve for Item 4 lies, above that for Item 3~, and both are 
almost horizontal. Recalling that Item 4 differs from Item 2 only in 
that the o^pration is subtraction* instead of addition, we infer that 
the difficulty of subtraction of signed numbers depends strongly brr 
the particular pair ojL^mmbers involved. This must be the reason why 
the item pair 3, 4 sho^ejd a different behavior from item pairs 1, 2 
and 5, 6 in intrapair differences. r 

* ' 7 * 

• % 
7 . WEIBULL AND. GAMMA FITS 'COMPARED 



fn this section we compare the relative goodness of fit of 
the Weibull and two-parameter gamma distributions to time-score data 
from .various sources and attempt to come up 1 with an explanation for 
when and why which offers the better fit. As a general rule, it seems . 
that for material requiring £ sustained, uniform thinking process the 
Weibull has an edge over 'the gamma. T]\e Weibiill also showS; wider 
applicability and greater flexibility. On the other hand, if the task 
consists of a concatenation of several relatively independent "and more 
or less simple, mechanical subtasks or stages, the gamma fit seems 
better. Of course pure cases of either type are rare, and often the 
fits are* ambivalent* Cases for which neither distribution offers an 
adequate fit seem to be ones in which the several stages of a task are 
non-independent or non-mechanical or both. Compound Weibull distribu- 
tions^ would probably shdw good fits in such cases. 

Due to the abundance of fittings undertaken, the Kolmogorov- 
Smirnov p-values and the Weibull and gamma parameter estimates are 
shown in Appendix E, and orvly summary tables are given in this section. 



\ 

7 



1 61 V 

72 



\ 



7.1 Multiplication Pretest and Posttest 



Data from a sample df 56 examinees whd took both the pretest 
and posttest for simple operations and matrix multiplication lessons 
were subjected to goodness of fit testing for Weibull and gamma dis- ' 
tributions. The^ estimated parameters of Weibull*and gamma distribu- 
tions based on time-score data of items were determined and tested 
**by the Kolmogorov-Smirnov test. Items were classified into four cate- 
gories according to p-values from the Kolmogorov-Smirnov testing: 



(1) $> values for Weibull (p^) is much bettefr than 
p values for gamma (p Q ) ; (p y - p ^> .10) Q 



> I) 



(2) p w is better but not much; (.10 > p w - p^. 

(3) p is better but not much; ('.10 > p - p > 0) 

Lt w 

(4) . p p is much better than p y ; (p p - p u >_ .10). 



^W - 



In otder to show which theoretical distribution is better for the 13 
items, the frequencies in each category were counted and summarized 
in Table 13. 



Table 13 

Comparison- of Goodness of « Fit for Weibull and Gamma in MultpGst 



Category 


Protest 


i 

(23 items) ,x 

— _ 4 L. ~ 


Posttest 


(23 items) 


OK Subgroup 


NO Subgroup* 


e- 

OK Subgroup 


NO Subgroup** 


r 

1 


13 (57%) 


9 (47%) 


16 (70%), 


6 (75%) 


2 


4 (17%) 


4 (21%} 


'2 (9%) ' 


1 (12..5%f 


3. 


3 (13%) 


6 (32%) 


3 (13%) 


0 




3 (13%) 


o> 


2 (9%) 

9 


. 1/12.5%) 



Four items were omitted because of rfmall N 



Only eight; items had N _> 12, 



The Weibull distribution is a distinct preference* for both 
Che OK and NO subgroups in the pretest a^d the posttest^ but the posfe- 
test shows a slightly higher percentage Ln Category l' than the pretest 
does. The items that fall in Category 1 4 are only a few in each group 



of the pretest and posttest. It can be said of 87 percent:, 100 percent, 
and 92 percent of items in both groups of the pretest -and OK subgroup 
of the posttest that the cumulative distributions of their time score 
data are pretty well approximated by Weibull distributioi^functions . 

The average p 'values and standard deviations of \ach group . 
for both distribution functions are given in Table 14. 



Table 14 

v Average' p-values for Weibull and Gamma Distributions; 
Multpost (23 items and N = 56) 



Pretest Posttest 



OK Subgroup NO Subgroup .OK' Subgroup NO Subgroup 



p w 




•75 


.65 


.82 


.30 


.27 

4 > 


.16 


.6.3 


.39 


.50 


.35 


. .37 • 


-.35 



P G 



The average values of p w in the four»columns arVlarger than 
those of Pq in the same columns^ and the standard deviations*'of p values 
for Weibull (SD W ) are smaller than the standard deviations of p for gamma 
(SDq) . The values of Pq fluctuate considerably more than those of p^. 
The result that Weibull is better was expected because items in the 
matrix algebra test were not e£sy, and many levels 'of knowledge that 
are hierarchically or linearly related would have been: required to 
arrive at their responses. Therefore a gamma distribution, which is 
3 convolution of finite number of independent negative* exponential 
'variables which' Restle (1962) interpreted as representing independent" 
» stages or components of a problem* solving process, cannot "explain theo- 
retically the time score data from matrix algebra test items where the 
stages or components are not independent. Indeed, for most of /our items 
there- is no way we can say that the stages to* reach a responseL>f a 
' giv^ item are independent from one another. Since the Weibull distrihu- 
vtion does not require s\ich a strong assumption so, no matter how each 
stage relates one to the other, a whole process of cognitive^ tasks to 
reach a response can be modeled by a -Weibull distribution. The response 
can be positive or negative as long as a student's process of achieving , 
their cognitive tagk can be considered to be of the" same kind. This 



,3 74 



means that the OK subgroup in the posttest and NO subgroup in, the pretest 
follow different processes of thinking for reaching their responses, 
but subjects in the OK subgroup may be .following a very similar thinking 
process to reach their responses, and so are subjects in the NO subgroup 
for the pretest. 

The NO subgroup in the pretest may be characterized as follows: 
many examinees gave up trying a problem hard and responded by guessing 
their answer while the student^ inthe NO subgroup in the posttest 
tried hard and spent' longer times but unfortunately their answers were 
wrong* A close examination of the CRR would tell more about these 
relations . 

It is interesting to note that the p G of the pretest was .59 
but it dropped to .39 for ttfie posttest in Table 14, and for the $0 sub- - 
groim^ it dropped from .63 to .50 while p w of both groups don't change 
theiWvalues so much. As we mentioned earlier in this paper, if* responses 
to a given item occur at a random base, then the time-score data follows 
a negative exponential function. Gamma is a» convolution of such 
negative exponential functions. It is probably true that the number 
of examinees who took the pretest lanswering randomly by guessing are 
likely larger than those for the posttest by which time everybody had 
learned the material already. ' 

The Revised Pretest, the Case of N = 10D/and 48 Items . Although 
the original version of the pretest was* designed so as to minimize Jthe , 
guessing effect on the time"score d^ta, their data were not analyzed * 
for comparative study of goodness of 'fit testing of Weibull and gamma 
distributions.. The revised version of the pretest has a matched 'sample - 
of Multpost as a subset of the N = 100, the whole pretest sample, for 
items 1-18 and 25-29 out of the 4S items and these 23 items wer^ analyzed 
in the previous subsection, but the summary of ' the pretest, 48 itepis is 
given below. ' ■ 



1 Table 1 



\ * . •/ 

CompaMson of hteibull^aritl Gamma Fitting for 
the »ised Pretest (48 items and N = 100) 



Category* 


OK Subgroup 


NO Subgroup 




29 (60%) 


22 (46%) 




6 (13%) 


/ll (23%) 




11 (23%) / 


/12 (25%) 




2 (4%) 


3 (6%) 

I L 



1 

2 
3 
4 



{items with p w - p G ^.,10}; 
{items with 0 < p w - p G 
{items with 0 < pQ - p,, 
{items, with p G p w '>\l0} 

75 , ~ 

64 



< -10}; 

< .10}; 



I 



In Table 15, only two items for t'he OK subgroup and three items 
'for the NO subgroup fell in Cafc^gpry 4; that is, 94 percent to *96' percent 
of the 48 items are favorable to Weibull distributions for both the OK 
and NO subgroups. The average p values and standard deviations have very 
similar results to those of the 23 matched pretest and posttest items. 



Table 16 

Average p-values for Weibull and Gamma; the Revised 
Version of Pretest (48 items aad N = 100) 



Distribution 


Mfean S.D. 


OK Subgroup 


NO Subgroup 


Weibull 




- -59 


.52 






.33 

r 


.36 










Gamma $ 


P G 


. -47 


.40 


* 


SD G 




..41 






** 





The average values of p w and p G for the OK and NO subgroups 
are about .10 smaller than the average values shown, in Table 14 and the 
'standard deviations in Table 16 are larger than those in Table 14. But 
it is obvious that our observation in the previous section is applicable 
to these data as well. ' ' 

The Postftest: Matinvtest, Transtest, and Eigtest . Three more 
posttests were analyzed by the Kolmogorov-Smirnoy test. These samples 
were not mJhched with the pretest sample. The results of a close 
examination for these data only revealed the same conclusion as those 
in the 'previous two subsections, with more ^emphasis on the fact that 
Weibull distributions are more suitable to our items of the posttests 
in the matrix algebra test than gamma distributions. Tables 17 and "18 
summarize our observations. 

WeNhave only two items which fall in Category 4 in Table 17, 
one each in Transtest and Eigtest. It is natural to wonder which'^items 
fall in Category 4 and why their time scores fit gamnta better. We will 
•pick up such items and discuss further details in the following subsection. 

/ . 



to 

65 

76 



w - Table 17 

Comparison of Weibull and Gamma Fitting; 
Matinvtest, Transtest and Eigtest 



Matinvtest 



12 Items 



Transtest 



Eigtest 



7 It 



ems 



8 Items 



Category, OK Subgroup OK Subgroup NO Subgroup OK Subgroup NO Subgroup 

























1 


9 


(75%) 


3 


(43%) 


4 


(57%) 


5 


(62.5%) 


2 


(2*5%) 


2 


• ' 1 


(8%) 


2, 


(29%) 


.2 


(29%) 


1 


(12.5%) 


1 


(12.5%) 


3 


2 


'(17%) 


t 


(14%) 


1 


(U%) 


1 


(12.5%) 


5 


(62.5%) 


4 


0 


if 


1 


(14%) 


0 




1 


(12.5%) • 


0 





Almost everybody got all 12 items correct, so the NO subgroup 
is almost empty. , «. 



Table 18 

The Average p-values for Weibull and!>Gamma; Matinvtest, Transtest and Eigtest 



Matinvtest 



Transtest 



Mean 



, Eigtest 



Distribution S.D. OK Subgroup OK Subgroup NO Subgroup "OK Subgroup NO Subgroup 



Weibull 


p w 


.64 


. 50 


.77 


.79 


.80 




SD w 


-.21 . 


.37 


/ 
.34 


.16 

*» 


.21 


Gamma 




.33/ 


.42 




.65 


.76 






.3/8 


.42 


.34 


.22 


.21 


* 

Almost 
subgroup was 


everybody got all 
carried out. 


12' items correct 


, so no 


analysis -for the NO 






Items • Whos'e Time 


Scores fit Gamma 


Better 


. ' We found that 





Weibull distributions are generally more appropriate to apptc^ximaLe the 
cumulative distribution of the item time-score data from the matrix 



'/. . : C 



algebra test than two parameter gamma distribution functions. But there 
are a few items whose p-values from goodness-of-f it testing are favorable 
to gamma. Of course, our sample size is hot large enough and the 
observations are restricted to only 48 i^ems in pretest and posttest, 
therefore it is dangerous to conclude that the Weibull definitely is our 
distribution. Besides, there are many psychological, intellectual and 
physical causes that individually or collectively may be responsible 
for reaching responses at any particular instant. It is impossible to 
isolate these causes and mathematically account for all of them, there- 
fore the choice of response time distribution is still subjective and 
cannot be completely scientific. With these difficulties, it is neces- 
sary to'appeal to a reasoning that makes it possible to distinguish 
between the different distributions on the bases of logical considerations 

We hope that our reasoning developed in the previous sections 
in terms of why mos£ time-score data of items in the matrix algebra test 
are favorable forT/eibull distributions is copvincing to the readers. 
We must argue now why these few items show their favor to gamrrfa. They 
are Items 2, 4, 6, 9 and 25 for the OK subgroup on the pretest, 18, and 
42 for the same group on the posttest, and Items 18, 27, and 34 of the 
pretest, and Item 17 of the posttest for the NO subgroup. 

Items Presented in the Posttest Without Proper Instructions . 
Items 17 and 18 fell in Category 4 when they occurred in Nultpost but 
when they occurred in Matinvtest, both items went £nto Category 1. Be- 
cause they are testing the knowledge of determinant -that is not taugbt- 
in the multiplication lesson, a great number of examinees in the multpost 
sample did not know about the determinant of a matrix. This complaint 
was confirmed by students 1 open ended questionnaire. Trans test invited 
the same complaint. Trans test items in Table 17 show more favor to Gamma 
than items in Matinvtest and Eigtest for the OK subgroups. Now, let us 
go back to 17 and 18. Since Items 17 and 18 and some items inTranstest 
are the only- items that were given to Jthe students prior to the related 
lessons being taaght, the related topics would never be taught in a 
series of matrix algebra lessons, their responses,, might have been reached 
by different causes. Since these items were well fitted to Weibull in 
•.the. pretest, psychological effects might be disturbing the determination 
of t Q , the minimum response time and CRR, conditional response rat^ 
or the shape parameter c. * . 

Items 2, 4, 6 and 9 . As mentioned before, we experimented 
with two types of items: one type used fixed numbers in each element 
of matrices, the other used a random number generator to fill in each 
elemen-t. Items 1, 3 and 5 ask for addition, subtraction and transpose 
of 3 x 3 matrices with fixed elements while Items 2, 4 and 6 ask for 
the same operations with randomly supplied integers between -9 and 9 
inclusive. Item 9 asks for multiplication of a scalar to a matrix and 
the random number generator supplies integer elements between, -9 and 
9 except for zero. Comparison of Kolmogorov-Smirnov p-valuesj is shown 
in Table 19. ; 



67 

78 



Table. 19 



"^Comparison of p-values for Weibull and Gamma; t 
Items 1, 2, 3, 6 and 9, Matched Sample, 

N s 56, OK Subgroup 



Pretest Posttest 



Items p w , Weibull p G> Gamma p w> Wek^ull 'p , Samma 



1 


.69 


.03 


.63 


2 


.27 


.77 


.73 


3 


.89 


.76 


.93 


A 


.99 


1.00 


.57 


5 


.80 


.12 


.93 


6 


.15 


.27 


.18 


9 


.88 


.92 1 


.93 



.ob 

.67 
.93 
.46, 

.20 
.76 



Items 2, 4, 6 and 9 have higher p-values from Kolm6gorov- 
Smirnov test for gamma than for Weibull irv the pretest while Items 1, 3 
and 5 have higher p-values for Weibull than for gamma in the pretest. 
But in the posttest, the p-values for Weibull became higher than or 
almost equal to those for gamma. 

Weibull distributions are determined by three parameters, 
s the minimum time t Q , a shape parameter c, and a scale parameter u Q 
while our gamma distribution has only two parameters ,/ithout a location 
parameter t 0 (or minimum time). Graphic display of both of the cumu- 
lative distribution of time-score data and the theoretical distribution 
function on the same PLATO screen often shows that the^ smooth curve 
gamma did^ not fit the cumulative 'distribution step function near the 
initial point t Q . 

The -examples shown in Figures 18 and 19 explain the situation 
intuitively. Two-parameter gamma distributions lack the capacity to . 
provide information about the minimum time t Q , unlike a three-parameter 
gamma which has a location parameter. 



Since JItems I, 3 and 5 have fixed number elements in the 
matrices, the degree of difficulty due to calculation for each item 
is constant, and does not vary from item to item, whil^fc»tems 2, 4 <and 
6, having a different set of numbers' as elements in 3 jft matrices, , 
lead to different difficulties in calculations % For'e^BLle, 9-1 is 
much easier than -9-(-l), especially for those who are^(fdering how 
to do 'the subtraction of two matrices. Thus, t Q minimum time to 
respond to an item such as Items 2, A, 6 and 9 can be different from, 
item to item, depending on what kind of numbers' were picked up by the 



68 ^ 

79 <\ " S 



OK Group 



quest i on number 5 



78, p = 7. 662 



1 .3 



0.9 



0. 8 



0. 7 



0 . 6 * " 



0. 




Gamma D i st r i bu fc i c n : n = 53. 
The hype-thesis is that the d*t=i 
& f rom a Ga mrna d i 5 t r i but i of » . 
V ^S^ Can reject hypothesis i»jith prob- 
~~~~~ atn44-ty^JlL_i_2^ of being t fn-v»i iv . 
"z" for this ^mple""t^l . 186. 



2 =.pon ?•€ "fc i mo " . r 9- t ■ :> 
LflB for -r*ph, NFXT for •>« 



srvt ion; 



Figure 18 Goodness of fit test for the time-score data ind Gaama 
distribution function ' 



f 



erJc 



69 

80 



.A- 



* OK Group 



question, number . 5 



to = 8.77, max . qorr , 



0.99, 



c 



1.17, y o = 13.13 




\ 




We i bu 11 0 i st r i but ion t n = 53. ■ 
The hypothesis is that the data 
3 re f r on i a We i bu 1 1 d i st r i bu t i c-n , 
C=? n reject hypothesis uuth iorofc«- 
«*b'i 1 i ty~ 7-9^ o f be i wrc-i . - 
"z" for thi* triple is flf.S-My. 



7? 



/ C'^poTT?-e tim^s (9 to 8-8.1 
LAB for groph, NEXT f^r next -quest ion. 



Figure 19 Goodness of fit test for the time-score data and Weibull 
A distribution function 



ERLC 



70 



81 



random number generator to. supply as matrix elements. It may be ' 
impossible to determine a unique t Q -value for these items. The Weibull s 
distribution requires a location parameter t Q to be estimated from the 
observed data, and it is impossible to estimat'e such a value when a' 
single t Q really does not exist. Maybe this is the reason why the 
time-score 'data from these items don't fit Weibull distributions so 
well, in comparison wittf gamma distributions that, do^s not require a 
unique location parameter t Q . \V A 

In the posttest, the different degrees on difficulty caused « 
by a choice of different sets of numbers became negligible. Student^ 
had already learned the simple matrix operations and had plenty. of 
opportunities to practice them before taking their posfctest. Therefore 
the discrepancy among t Q s, varying from item to item due to the diffi- 
culty of calculation would have been minimized and became negligible 
also. That is probably why Kolmogorov-Smirnov p-values for the Weibull 
distribution of the posttest improved a great deal as shown in Table 19. 



7.2 Exercises in Matrix Algebra Test that Require Only Mechanical Practice 



The matrix multiplication lesson includes eight sections 
with exercises at the end of each instruction. These sections are as 
follows-. ^ 

' 1. Multiplication of A and B * 

2. AB ={= BA ' 

3. ^ Scalar product 

4. Matrix product 

5. Quadratic form 

6. The principles of matrix operation . * 

7. Diagonal matrix - ft _ _ * ' 4 

8. Scalar matrix and Identity^ matrix 

Each exercise has the following format where all elements „ in matrices,-, 
are supplied by the-,random' number generator. 1 — *. 

All^ items in each exercise are 'very easy, straightforward 
examples of what they have learned in the previous instruction. There- 
fore each problem involves only mechanical calculation rather than " 
requiring heavy reasoning or thinking. As in Figure 20^ each exercise 
requires simple repetition of calculating a scalar product, and. hence 
a strong similarity can be seen to Rasch's model in which the distribu-* 
tion of time taken to read a passage of N words follows the two-parameter 
gamma distribution. The repetition of N mechanical calculations 
corresponds to reading N words ^,we think. 

* The time data for a student's first try only were sorted out 

and goodness of fit testings were processed. A summary is given in 

r 

\ 

t 

71 



82 



-8 
. 8 
0 



- 1 
-9 



0 0 

9 0 

0 1 3 



6 7 
9 -4 



B 



/ 
1 
3 



-6 
1J0 



w = 



•7 
•7 
3 



. 8 


4 


10 


1 


9 


• 10 


3 ' 


-1 


5 


8 

i 


4. 


-6 



r = 



_5 


* 7 


9 


-1 


-6 


10 



2 
-5 



3 
2 



• 5 
7 



Rnswer the following questions: 



^Choose, a number to select a problem; 



1. w'u 

2. u'w' 

3. ' w ; w 

4. u'u- 

4 



/ 



■ ; > 



'igur© 20 An example of the excercisea in th« tatrix multiplication ljfeason 



ERIC 



CM 



Table 20. The time-score data from these exercise sections fit the 
gamma distribution better than the Weibull distribution. 

Table 20 



p-values for Weibull and Gamma: 



Exercises 



Section of Exercise 


P V 


P G 


N 


e02.1 • 


.54 


.82 


74 


'e02.2 


.25 


.47 


61 


, e02.3 


.45 


.57 


67 


e02.4 " 


.68 


.95 


53 


e02.5 ' J 


.98 


.98' 


16 


eQ2.6 


..88 


.90 


39 


e02.7* 












m 





/ 

Note: — Average p-values: r 63 for p w 
and ^78 for p^. 

Data irt this section was lost. 



. Recall that the items generated by* the random number genera- 

' tor, 2, 4, 6^ and 9, in the matrix algebra test had a tendency for 
their time data to show favor to the gamma distribution, in the pretest. 
But in this case, students took exercises after completion of the 
related instruction, so the argument about the difficulty of deter- 
mining' the) minimum required time to respond to a given item in the 
pretest si-tuation cannot be applied to the situation here. Weibull 
became a- better fit for Items 2, 4, 6 and 9 in the posttest situation. 
We will need another reasoning to 'explain why gamma is better than 
'Weibull, in the exercises after the instruction. 

The two-parameter gatama distribution is a convolution of 
k independent variables which each follow identical negative exponen- 
tial distribution functions, and the negative exponential distribution 
can>be obtained ^^considering the waiting time between arrivals in a 
random process.. * Rasc> (1961) constructed his oral reading model 

; (word jrfeadliiy muUei>i>y drawing tm an alogy with a si mp te --proH^^itr""~ 
telephtony: 'the occurrence of a telephone call as a random- event , 

.determined «b,y a "calls intensity" parameter which is stable over a 
certain length of time. In the exercise? unit shown in Figure 20, each 

'question involves ^ four simple calcu-fat^ns , and three out of them 
require multiplying tire ith element of one vector by* ith element of 
the other vector and the last calculation involves adding up the three * 



4 



73 

8"4 



• J 




resuTfrs-^f multiplication to get the scalar product. We view, these 
o'perajions as being simple arid mechanical enough to" identify them with 
reading the k words which were used in Ranch's word reading model. 

» ' <o 

Exercises in a Problem Solving Style . Three problem solving 
style exercises were implemented in the lesson teaching eigenvalues an'd 
eigenvector problems. For example, one problem is aimed at guiding a 
student, step by step, to the goal 6f calculating .eigenvector$^of a 
2x2 matrix. There are four or five stages -required to arrive at the 
final answer and all 'stages are lin^ajly related, so that previously 
given stages are required as prereijj^isites to understanding a later 
stage. Therefqre this type of exercise violates ttfe assumption used 
in deriving gamma distributions. . Note th^t.Weibull distributions 
don't require such a restrictive assumption and hence have\wide applica- 
bility and flexibility to more general examples according to Weibull 
(1951). ^ We predict that the <ime-score data from exercises in a 
problem-solving style will fit WeibulL distributions very well, and 
Tables 21 and 22 back up our prediction. 









• Table 


21 














* p-values 
Problem- 

* 


for Weibull 
solving Type 


and Gamma: 
ExerciseV 






» 






Unit 'Names 






P G 


N 






«# 


* 


* e05.1 

e05.2 , 
e05.'3 


.98' 
.93' ■ 
.99 




.56 
.05 
.84 


31 ' ' 

30 

29 




* 












c 

1 









9 Table 22 

Weifcrull Parameters for Problen-solving Type Exercises 



Unit Names 


• 

t 

» 0 , 


c 


^0 


Average time* 


. e05..1 '. 

e05.2 ' 
* - ' e05.1' 


3.758 
3.593 


__<88(L 
.790 
.962 


. 11.36 
9.96 

12. 1A 
• < 


• 

' 18.39 . 
■ 24.77 
16,72 


JL 

"Unit 

*• - * 


tff times is 


10 seconds . 


t 

9 


r 

9 



9 

ERIC 



o ■ 



r 



\ 

.The c f s for these three exercises are smaller .than 1. 
Since, ^despite this fact, the average times ""are very [short, it may be 
that the abundance of hints given during exercises allows many students 
to speed up toward reaching their given goal. 

7 . 3 % Instructional Units or Areas \in Matrix Algeb4a Lessons 




Matrix algebra lessons were divided into nineteen 'small 
segments or instructional units and the, elapsed time to complete each 
instructional unit was collected. v Since these lessons <Ud not -adopt a 
mastery learning strategy* it was impossible to collect mastery timl 
which is the time needed to master a given* instructional < unit , so the 

irst completion time of eacrf.unit was used for analysis. The results 
of Kolmogorov-Smirnov testing are summarized in- Table 23. 

, * M ■ : • - . 



Table 23 

p-values' f rom Kolmogorov-Smirnov Tests: Matrix Areas- 




r 



Areas 


Content 




1 

p w 


P G 


Average Time 


* i 

.101. 1 


Simple operations 


128 


.09 • 


.00 




10 


.5 


iOl.2 


I Use of system calculator 


' 134 


.30 


.oi 




2 


.0 


102 . 1 . 


Multiplication of matrices A, 


B 135 


.30 - 


.50 




6 


■K 


102.2 


,AB 4 BA ^ . 
^caSr product 


123 


.73 


.53 


\ 


1 


8 


i02.3 


114 


,02 


.01 


1 


'0 . 


102.4 


Matrix. product 




.14 


.13 




' 1 


•3 * 


i02.5 ' 


Quadratic form , 


fl22 


_. 33 


.'56 




3 


1 


*' 102.6 


Properties of operations 


109 


.21 


.12 




1. 




i02.7 


Diagonal matrix 


104 


.66 


.29 




2 


3 


i03.1 


Identity matrix 


■ 105 


.14 


.19 




4. 


9. t . 


103.2 


Determinant <* t 


103 


.50 


.48 




13. 


5' 


' 103.3 


Evaluation of determinant 


101 


.64 • 


.32 




7. 


1 


i03.4 


Cof actors 


. 100 


.81 


.68 




8. 


9 • 


i03.5 


Properties of determinant 


98 


.62 


.72 




,9. 


9 • 


i03'6' 


Adjoint and inverse matrix 


10? 


.95 


.76 




11. 


9 


i04.1 


R&tafion of axes 


■ 73 


.00 


.02 


i 




8 


104.2 


Orthogonal transformation 


52 


.56 


.79 




19. 


4 


104.3 


SSCP matrix . ^ 
''Eigenvalues and 'eigehvec tors' 


% 48" 


.82 


.99 




19. 


1 


^ 10b. 1 


• 72 


: .71 


.83 




14. 


8 • •* 



Unit of time is rounded to the/nearest minute. 



75. 

86 



The number pf preas whose p-value is larger than .20 for 
Weiball is 14 and that for gamma is 12, but the number of areas wfcoss 
p-value is° larger than .40 is 1Q in each case. ^The average p-value, ^ 
"p w and p*Q are .448 and<£.417 respectively.^ It is difficult to say .which 
distribution is more suitable to our data, because five areas are 
classified in Category 1, while seven areas'are in Category 4. The & 
combined Category 1»_2 and Category 3, 4 respectively, include eight 
areas each. 

Three instructional units , - iOl . 1 , ~i02 . 3 and i04 . 1 "doi]i!t»^it 
either of the distributions while others fit both Weibull and gamma* 
pretty well. A close examination of all area d^ta from tjie matrix lesson 
revealed that 20 to 30 percent of time data from some areas were not , 
the right kind of data that we were interested in. Before the revision 
-of the lessons was made, quite a numbe(r of students complained about 
the lack of flexibility in the original version of the\iessons. The 
original lesson did not have an index page, so if a student starts' a 
lesson, then he/she was forced to go, thraugh tffe lessen without " " - * i 
changing the topic until the 'end. Therefore students were mor^con- 4 
cent rated on studying and rn^any stayed on the sime lesson until tbey 
finished it all. 4 In the new' version, some students' got otit> of 'one 
secFi^n before. they finished it> and they went back to the ihde* page, 
at the 'middle of instruction by pressing the" key that is always 
available at qjiy page. The* time data used, in Tabid 22\wer& not 'exactly 
the, completion" time of each section since 20 to 30 peraer\t of the stu- 
dents did not- complete ^ome areas ^ " A 

The time data of th§.-Sld version fit ^eibullNrexy well. 

The first three lesso f nb^{def iriitions and simple operations, matrix 

multiplication and d'etermifr^xit , co*factors and inverse) wejre divided , 

into nine instructional units (arae^s) and the time dsr£a from these 

nine areas were analysed. Their fit to the Weibull distributions' was 

- %t 

v^ry good,. with the average p-value being .80. ,If the data is fairly- 
4 cJLean, then a small segment of instructional unit fits the Veibull * 
distribution very well. * _ . * < 

. r . > 

It was interesting to note that one area whicitf was, gjlven 
twic,e during the course showed a remarkably low p-yalue fof« the second 
presentation. When students s^Odied this area for the .fiYst* time, 
the p-value was .95, but (Jft.the second time,, it was only «03. - 



7.4. The Lessens of Special* and General Vehicle Training Program at ' 



"tlranute^Aif Force Base . < f 



-> r 



w v The Chanute AFB'CBE project developed 34 lessons to teach 
repairing and maintenarfce 1 of Various vehicles ^on the PLAJO system. 
They "also developed their own computer managed instruction system, tind 



student router. Their unit of measured time w*i£ rounded to fci^nearest 
minute. Some{^ tests required <? a, few minutes 'to complete for all students, 
while other tests had an average time of longer than 10, minutes. About 
90 percent *of the examinees needed about three minutes to qotnplete many 
Mastery Validation Exams, and hence rounding to the nearest minute was , 
too rough to analyze these ' tinte data, so we- had- to *throw them away". 

The time store f rom \he lessons were much better .than, those 
from the Master Validation -Exams^ but a few lessons had, a very short 
average time needed to complete and master the liessons. For example, 
Lesson mve* 201a requires an average time' of onlyNL2^6 minutes to master 
the lesson, .yet 72 examinees studied 1 -the lesson. If^the ufiit of time 
were' the nearest .second," then the p-values f rom* Kolmogorov-Smirnov 
'would, have become ^larger. The plotting of a stepfunction (observed j ' 
♦tiirfe data) together with a smoothcur've (Weibull distribu tiori function) 
in- Figure 21 has a very large increase of height around the mean value 
of 12.55 minutes. That affects the z-value. These two plottihgs look < 
like a j fairly^lose m^tch intuitively, while the average tfme needed 
to master Lesson mve 202a is 189.63 minutes and the steps of the . ■ 
.observed curve are very, fine in' Figure 22. " The correlations 08 average 
times and p-values from VeibUll-f itting over 27 Chanuxe lessons is .57, . 
p^values from gamma-fitting over 27 lessons is .34. Therefore it" will 
be wise to, take a finer' unit of time in educational research, utilizing ' 
time scores. Since Chanute lessons used a mastery learning strategy, 
two kinds of time data were available; one is the first cdmplet/ion Dime 
fer a giVen lesson and the second is the tim§ needed until* a student 
achieves a* given criterion ofmastery at the end o^f the lesson test, . m * 
Master -Validation Exam. ; * * 

»• x The followingv tables, '25 , and 26, present a sununary "of 
Kolmogproy-Smirnov tests^for gamma an % d Weibull distributions., Appendix* C 
explains ^lig^content area that all*Chanute Ljas^ous were aimed at, and * 
average (time rtar each^lesson^ Tablfe 24 ;^hows the Weibull parameters*^ 

^The average -^^alue for Weibull is/: 46, and that tor gamma '/ 
.is .51, m fhe^& values ati $lm<6st the same e$ the "average p-valuds fpr 
the areas iq 'matrix' algebr.a lessofl, but -they are not so high in., com- 
parison with those of test £(ems and ex^rfcises irf matrix algebra 
(refer, to Tables 14*, 18, '19 arid 21 X /' " J- *~ * 2 

" * ' - ' . 1 

Although' b^h of the average p^-value*s are only around' .50, 
about 80 petcent 0 o*f £he lessons have p-values o % f larger than or equal* 
-fee— ^^.-xir^aSTefy rime feime needed' "to achieve a 'gtveff mastery TTevSfTT" 
*iWhich is a satisfactory result. Table 27 show? that, gamma is sligfttly 
better than Weibull. Sine©' fcfie gamma dis trlbuiyioa^hai ha$ been 
considered. he^£ i^'a 'two-par^ine ter dis tribution/W thou t *a location * 
parameter* \t \$ bothering to qiote tfrat g^mma is "better for. this case. 
Because learning the ^te rial writ te^p In 9 Whole lesion is not* a 
ipatt&r of sttap-^e process, or. an\*evont^ that can be explained *in" 

> ' * • • * ' l 



X Table 24 

Three Weibull Parameters and .the" Maximum Correlation 

for Mastery Time 

t m.c q 

1) 6.87 .98 1.61 30.03 

2) 15. 18 .9-9 1.55 21 .54 
■ 3) 16.21 .99 1.87 32.65- 

4) 4.72 .97 1.25 8.06 

5) .00 .98 1.53 48.32 

6) ' 48.67" .,99 1.95 160.86 ' 

7) 4.49 .99 . . 2.53 41.32 
• ' 8) ■ ■ •• 8.63 .99 . ' 1.81 105.64 

9). 35.38 .98 - 1..79 126.09 

10) ' 5-38 .98 1.67 42.22 

11) 14.85 .99' • 1.77 42.19 » 

12) 3.13' .99 1.66 20.72 
" 1*3) -4.$ .99 1 .81 46.82 

14) 13-21 • . .99 ' 1.29 ' 27. 16 

' 15) ' .00 - .99 1.81 36.65 

16) .' .4.2-7 .99 1.'85 1-1.24 

• .17) .57 '.99 240 14-.-37 

18) , 6.38 , .99 . W52 -18'.40 

19) . . 7;93 r .99 "2.11 * 7<¥.52 

20) 11.50 , • .99 1.75 .44- 13 
I 21) '\jg?J+- .99 -2.85 20.37 
l\ 22) ' 20.55 .99 ! J. 63 79.59.' 

\ 23) .76 .99 1.21 .15.91 

• r 24) <-tfO .99 ,1.75 14.27 
'^2-5) . 4.33- . ".99 1.76 3^.10 .• 
. 2-6) -s - 12.2 2 .99 '1.53 2*5.01 
'27). '2.53 ' .97 ^^U55 •12.96'* 



\ 

* ** 

\ r * 

/ 

> 

A 

— * i 



J 



79 

91 



f 



Table 25 

i rr.ssor :<v-b n.im:v Tests for Chartute Data 



Lesson Mastery Time 

c z U 

1 : i\i:48 1 . 1 777 


•Completion Time 

X ' z U • 
0.2716 0.9983 55 


2) 0. 6094- 0. 7606 '83 


0.7266 0.6 8 95 6 3 


3i 0. 3349 0:9440 85 


0.4193 0.8811 85 


41 0.2123 1 .0587 . 72 


- 0.0156 1.5574 74 


5i 0-. 6653 0.7274 86 


I 0.8500 0.6 106 86 


6) 0.7731 0.6621 96 


x 0. 5205 0. 8i46 95 - 


7i 0.0624 1.3168 81 


0.0712 1.2914 81 


3.1 0.671 1 0.7240 86 


0. 7874 0. 6530 85 


9) 0.4919 0.8328 89 


1 " 

0.4251 0.8771 89 


1 ( 0J 0.5622 0.7890 78 


0.7288 0.6894 77 


' 11) 0.8667 0.5981 75 


0. 4350 0. 8703 75 


12) 0.2714 0.9987 80 


0.2121 1.0590 80- 


. 13.1 0.6780 0.7198 67 


'0. 3768 0. 91 17 67. 


* ■ ' 

141 0.8316 0. 6236 87~ 


0.6781 0.7198 87 


' 15.1 • 0. 7671 0.6658 77 


- 0.6552 0.7334 77 


16) 0. 3*222* 0. 9543 71 . 


. 0.2360 1.0333 71 


17) 0.1387 1.1550 67 


•"0.2167 1.0538 67 


18) .0.1438 1.1471 72 


0. 1344 1.1618 72 


,19) '0.9095 0.5627 62 


0.9189 0.5538 62 


20) 0. 1212 1 .1838 76 


0.-1682 1 . 1 124 75 


ilT 0.4265 0.8761 59 


. 0'. 4176 0.88.23' 59 


12) ' ■' • 0. 3812 0*9084 93 


0. 1583 1. 1260 93. 


. 23) • 0.098.8 1 . 2264 73 


0.0743 1.2-832 *1 


24) 0.4985 0.8286 67 " 


0.6039 0.7638 -67 


251 0.4089 0.8884. 70 


0.8339 0.6220 70 


0.9599 t 0.5062 70 


0^-9$29-ft;^-289 — 7-0- 


271 . 0.0938 1 . 2370 69 


0.8860 0. 5830 /68 



'goodness of fit 



testing for Uei bull 

r 



80 



\ . 



Table 26 



Kolraogorov-Smirnov Tests for Chanute Data: 



ir Goodness for fit" for Gamma 



Lesson 

1) rave 103 

2) mvel04a 
mvel04b 

3) mvel05 

4) nve201a 

5) rave201b' 
6> mve202a 

7) mve202b 

8) nve204 

9) mve'205a 

10) mve205b 

11) -vmve206a 

12) mve205b 

13) mve206c 

14) mve207 

15) nve301 

16) rive 3 03 

17) nve304 
^18) rave305 

19) r*/e307 ' 
4t0) nve308 
_J2J4_mve40i— 
2 2) iave402 

23) nve4Cf!3 

24) ,rave404 

25) nve405a 

26) rave405b 

27) mve405c 



Mastery Time 



'Pr-- 

0.0643 
0.4537 
0.8373 
0.5619 
0.0361 
0.6318 
(tf 8-646 
0.1554 
0.9260 
0.5686 
0. 1923 
0.7981 
0.6267 
0.7859 
0.2470 
0.8557 
0.5185 
0.6996 
0.3119 
0.6292 
0.2856 



z 

1.3110 
0^8577 
0.6196 
0.7891 
1.4169 
0.7472 
0.5998 
1.1300 
0.546 8 
0.7851 
1.0818 
0.6460 
0.7,503 
0.6540 
1.0222 
0.6064 
0.8159 
0.70 70' 
0.9629 
0.7488 
0.9858 



_XU8i28 0 . 6364 

0.2362 1.0331 

0.2761 
' 0.8396- 

0.3883 

0.6758 

0.0561 



0.9944 
#.6180 
0.9032 
0.7211 
1-. 3366 



85 

83 
6 

85 

72 

86 

96 

81 
-86 
. 89 

78, 

75 

80 

67 

87 

77 

71 

67 

J 2 

62 

v 76 
-5-9- 

93 

73 

67 

70" 

70 

69 



'Completion Time 



P 

0.1260 
0.5184 
1.0000 
0.7193 
0.0544 
'0.5357 
0*. 9889 
0. 1681 
0.5726 
0.7117 
0.9316 
0.5745 
0.6043 
0.5758 
0. 3833 
0.8879 
0.3720 
0.7225 
0.2130 

4 



z 

1.1757 
0.8159 

• 0/0 
6.6952 
1.3425 
0.8052 

"0.4449 
1.1130 
0.7827 
0.6997 
0.5410 
0.7815 
0.76*3 6 
0.730 7 
0.9069 

1 0.5814 
0.9153 
0.6932 
1.0580 



0.8028. 0.6430 
' 0.4262 0.8754 
- — -0^0232- — 1-.-49-2-4- 



0.5650 

0. 1003, 

0.9764 

0.9962 

0.98^5 

0r2597 



0.7873 
1.2232 
0.4778 
0.4087 
0.4430 
1.0097 



93 

81 



( 



parallel to a Poisson process, -or like Rasch's words reading model, 
it is probable that we will have to investigate • the composite distribu- 
tion model for Weibull .distribution, instead of a single distribution. 
An -r-compon£nt composite VTeibull distribution is defined as F (x) - 
Fj(x), Sj < t < Sj +1 for^= 0, 1, 2, ... r. Further mathematical 
discussion will be found in Mann et^al. (1975). In future work, we 
will have to analyze carefully a whole task of instruction in a lesson 
and divide it into finer tasks. The time-score dat;a from each task 
unit (or segment of instruction, or area) c^n be represented by a 
Weibull distribution. If a lesson is of k tasks, then a k-composite 
Weibujl distribution will be the distribution representing the whole 

rson. • Since it is impossible to investigate further along this line 
with Chanute lessons, we will work with matrix area Mata (after cleaning 
up the messy data) in the near future. 

Table 27 

27 Vehicle Maintenance Training Lessons, p-values v > 
* from Kolmogorov-Smirnov Testing for 

' Weibull and Gamma .Distributions 







p > 


.20 


P > 


.40 


. P > 


- .50 


Weibull': 


t 

1~ 


20 


(74%) 


15 


' j 

(56%) 


12 


(44%) 




2** 


21 


(78%) 


16 


(69%) 


*12 


(44%) 


Gamma 


1* 


22 


(81%) 




<67%) 


18 


(67%) 




2** 


23 


(85%) 


' 17 


463%) 


16 


(59%) 



















•k 

.Time ^rr£eded to complete a lesson. 



Tim'e Reeded to reach a 'given mastery level. v ; 



Although the mastery. time obtained from Chanute lessons did 
not fit Weibull distributions quite as well as time-score data from 
matrix test items did, the shdpe parameter in this context c has .an 
■ imp ort an t: THiarttrnsMp^lth one of Ttrer curreriT topics in educational 
measurement: the problem of false negatives and false positives of 
cHteri6n-ref erence'd tests # . The detailed\analyses and discussion 
of the role ot the shape parameter c will fee given in the next section 
The table of Weibull parameters for mas£^ry time data from Chanute ~ 
lessors was given in Table 24. N j 

V - " 



,f - Revised 'Chanute Lessons . After the initial data <the result 
of -the previous section 'was biased on this data) from all lessons in the 
"vehicle maintenance' training .course were collected and analyzed, seven 
Wessons were selected for further modification and revisions. /A. year 
" later, the first completion time of these polished, revised lesions 
were -collected andtested for goodness of. fit with Weibull distribu- 
tions.* The changes that were made were quite extensive -and avet^gi < 
times of the lessons became quite different from the original version 
_of seven lessons^ some gpt longer hut others got shorter. But the 
p-values from Kolmogorov-Smirnov tests became much. larger than the 
original ones. These values are shown in Table 28. 

Table 28 - * 

^Comparison of p-values from Kolmogo xov^ mirnov 9 
Tests f or* the Original Lessons and. 
their Improved Versions 





-ie-sstms 


Original p-value, 


Revised p-value- 




• , 202b 


.07 " *\ 7 


.90 




204 


.79 £ 


- . . ,54 




207 


.67 •* 


.68 




301 


.'66. ' 


.73 




307 .. 


Jji. . 


V .91 *• ■ 




308 - 


" /17 ' 


.79 


( 


401 


•42. ■ 


.79 
















— - — — . — , . — »_ 



f , Table 28" might suggest that the v time data from the tnore f 

polished, improved J essons fit Weibull distributions better than those, 
fr^ fh<B less' polished, original version of lessons. The less^ polished 
lessons usually contain ambiguous explanations, typographical! errors, . 
inappropriate feedbacks or improper amounts and c/uality of help. 
Eliminating such distractions .that affeeted a student ! s pace- of 
learning* ba£b&, especially \for\those who 'were not so /bright, or for 
thos£ who knew nothing about the material, migh 0 t havfc caused better fit 
with Weibull distributions. ..This fact implies that 'the study of CRR 
-will lead us^ to identify the qua4-i ty of^eet^b^cks , appro^riatejiess -of-- 
the help branch in terms of using qualitative analysis metHods. We 
believe that our research will be very useful to the area jbi instruc-. 
tional design ip a practical sense; x^e can provide a quantitative 
tool,, to instructional designers who are .mostly artists. > • 



95 

83" 



THE CORRELATES OF PROBABILITIES OF M1SCLASSIFICATION BY CRITERION- 
REFERENCED TESTS • * - « 



In this section we explore what variables are associated 
with erroneo^s~4ecisions — calling a non-master a master (false posi- 
tives, F+) and calling a master a noq-master (false negatives, F-) 
.based on the cr:Lterion-referenced tests of the Chanute AFB CBE Project. 
%he Weibull shape parameter c turned out to be a prominent predictor 
.of s 'the estimated probabilities p(F+) and p(F-). 
\* x - 

'Another thrust of this section is the definition of a new 
* index, dubbed the "efficiency index, 11 which we believe to be a reason- 
able measure of the quality of a lesson. A factor analysis using 18 
variables (including p(F-f), p(F-), p(F+ or F-)-, 'failure rate, 

the three Weibull parameters, the -distance between the optimum cutoff 
point and tfie mean, etc.) along with this efficiency index yieLded a 
distinct factor loading only this variable and c. 



8.1 Beta Binomial Model 



Criterion-referenced testing (CRT) has gained much attention 
from educational measurement and testing specialists in recent years. 
"The object of criterion-referenced testing is not to distinguish 
finely among subjects, but to classify subjects into mastery and noVi- 
mastery groups. ( Hence the^ accuracy of judging non-mastefy^or mastery 
status of examinees^ becomes the main concern. /■ 4 , 

Since criterion-referenced tests are commonly f used in situa 
tions where students are expected to achieve the level of mastery, say 



\ 
/ 

/ 



90 percent correct, the observed scores become a* bounded 
If there are subjects with true scores near the "ceiling 



it becomes implausible to assume that the errors of measurement are dip- " 



'tributed independently of true scores for those near the 



Lord and Novick (1968) argtSe about the plaus 



tional forms of obser ved CR T scores and Ir ue s cores ijl_J»hapte_ r ^3 of 



variable 

1 or tire "floor," 



boundary . 



ble distribu- 



tee will 



their book, "Stat-istical Theories of Mental Test Scores.., 
follow* their steps 'and adopt: the binomial error modbl for CRT scores. 
The binomial error model assumes that uf each MVE test is aimed at 
njeasuring the learning level of a topic taught in the Vehicle Training 
Course of thfe Chanute AFB DBE Project, ;for instance,, th<*n <ill items 

le test- must measure the same task. In other words nil items in * 
a test tTaVe-aixe^ and only one common factor "with 0-1' scoring; Suppose 
there is a pool ^oT^ltems measuring the, same task, and taking an' item 
out of the pool is an independent e'Verit , that is, answering the earlier 
items on the test d^es not affect the ability of a studpnt to answer 




r 



ERLC 



■36 



1 A 



later items correctly, then we can formulate the distribution of raw 
scores x by a binomial > distribution with 'parameter 0 in which 9 is 
the proportion of items that a student would answer coriectly over 
the 'entire pool of items. If T is a fix£d true score and e is an 
error of measurement, then the raw score x can be expressed as \the sum 
of the two, x = T + e, and 0 is given by 

9 = T/n 



where n is the number of items in the test. Let g(x|0) be the binomial 
distribution of x av any given true ability level 0, then the condi- . 
tiopal' distribution g(x|9) can^be given by 

, g(x|9) = ( ^9 X (l-0) n ~ X x = 0,l,...,n. 

" It is interesting to note that this model does not pay atten 
tion to item differences. The traditional measurement i*ndices such as 
ifc€m difficulty or items discriminating index are. not the major concern 
in the binomial error model. Rather, finding out how accuratel^a 
test can estimate an examinee's pass /or fail status with respect to 
a given mastery criterion is the main concern of the model* 

/ 

Keats and Lord (1962) investigated the relationship between 
the distribution of test scores, observed and true scbres. The test 
, scores could b*e adequately represented x by the hypergeometrit distribu- 
tion f(x) with a negative parameter and the true score distribution 
could be represented by the two-parameter beta distribution g(0). 

g(G) = e a " 1 (l-9) b-n /B(a,>-n+l) 



where a > 0 and b > n-1. And also 



*1 ^a-l^ Qvb-n 

9 (1-9) / n ..x.- a,n-x. 



* (x) = J: fejSl)-- < x )e A (l-9) l '" A de, x = 0,1 n. 

In classical test theor^, the estimation of a true score is 
given by regressing the true score T on the observed score x, and the 
equation is given" by 



E(T|x) = px + (l-p)y x 

0? 



where fi is the reliability of the test and y x is the mean of test scores. 

In the binomial error model, the estimation of a true score, 
is given by a similar equation, 

V 

E(t|x) = a 01 x + (1 -a 01 )y , x = 0,1, ...,n. 



where OL21 is the ratio of number-correct true-score variance to observed- 
score variance and is given by ' ^ 

2 

v ~ = {1 T- ] = a 2i 

n a 

x x , , 

Takle 29 is the summary of information from the Mastery Validation Exams 
at Chanute. S 



The mastery level of Master Validation Exams (MVE) of the 34 
lessons in the Chanute AFB CBE Project was set^at a level of 80 percent; 
although it is hard to prove that 80 percent is the most appropriate level 
for their' program. Block (1972) showed in his experimental study that 
attainment of a 95 percent mastery level maximized student learning of 
cognitive tasks in his matrix algebra course, while an 85 percent level 
maximized. learning as characterized by affective criteria. 

Since Chanute ! s 34 Wessons are designed to be "homogeneous" 
with respect to' content and teaching style, all lessons are written 
under the same principle with* the same tutoriaj. log^.c, although the 
subject matted in each lesson is different. Therefore Chanute 1 s 
lessons are ndt linearly related 'and the content difficulty of the 
lessons is not hierarchically ordered. as it would be in teaching mathe- 
matics, arithmetic, or foreign languages. If the lessons are linearly 
related, setting a mastery level 'for* the earlier instructional units 
should -be higher than those of the later instructional units. If the 
goal of the second unit is th£ attainment 'of an 85 percent mastery level, 
then the mastery levei of ttye first un^-t might be 90 percent, or some* 
other level higher than 85 percent. Since there is no analytical 
techriique to provide^the optimal 'level of mastery learning,- definite 
statements about the determination of ideal mastery levels cannot be 
made at this time. Linn (1978) provides an excellent discuss'ion - 
about the topic of "setting standards*" 



86 



98 



, ' Table 29 s> • 

The Summary of Simple Statistics of Mastery Validation Exams 



cesc 


mean 


oL) 

1.124* , 


items * 


a/ 1 


N. 


•mvel03 


* / 7.3&8 


8 


0.6321 

\ 

0.4910 


85 


rove 104a 


. 11.892 


" 0.442 


12 


83 


mvelOAb 


10.120 


11728 


11 


0.8018 ' 


83 


myel05 


7.706 


0.737 ' 


8 


0.5470 


85 


mve201a 


9.474 


0.973 


10 


0.5254 


76 


mve201b 


8.907 


1.325 


10 


0.4951 - 


86 


mve202a 


•16.186 ' . 


-2.934 


20 


0.6753 


97 


mve2Q2b 


9.720 


0.634 


V 1 ^ : 


0.3573 


82 


mve204 


. , 8.557 


1.681" 


* 10 


0.6253 


88 


mve205a 


6.767 


1.558 


* 9 


' 0.3470 


90 


mve205b 


• * 

8.110^ 


1.736' 


10 


0.5457 


82 


mve206a * 


* 

12X1538 


1.574 - 


13 


0.6942 


78 


mve206b ' 


15.250 


;.619 


17 


0.4259 


80 


mve206c 


19.257 ' 


1.151 


20 ' 


0.4841 


70 


mve207 • 


3.761 


' "1.124 


5 


, 0.3287 


88 


mve301 ; 


8.727 


1.501 


•■10. 


0.5635 


77 


rave303 


17.380 


2. 257, 


* 20 


0.5824 - ■ 


.71 


mve304 


9.209 


'1.366 


10 


0\6771 


6'7 


mve305 


7.458 


0.934 


8 


0.-4806 


72 


mve307 


J4.683 


1.522 


,16 . 


0:5101 


63 


mve308 . 

' Iks v 


9. 037 


1.170 


10 


0.4045 * 


82 


mveA'Ol 


9.254 . 


1.015 


10 


0.3673 


63 


mv e 402 1 


1A.138 •• 


2.335 


17, 


• 0.5988 


94 


mVe403 


8,095 . 


2.487 


10 


0.8340 ' 


84 




4.254 


0.876 


5 • 


0.2166 ' 


67 


mve405a~ 


*. , 9.169 


1.069 


' 10 


0.3701" 


71 


mve4y5b 
mve405c 


" ,8. 329' 
9,087 


. 1.991 
1.222 


10 
10 


6.7208 
0.4934* 


70 
6^9 



87 



9 

ERIC 



9 Mastery levels are usually set by instructors or the author 

of a lesson, but -the decision or mastery and non-mastery is based on 
■examinees 1 observed test scores. The score that is used to decide 
mastery and non-mastery is called the "cutoff. " Mastery and non- 
mastery statuses ougfrt to be defined on' the^basis of true ability 0, * 
not observed test scores x that are subject to measurement errors. 
If true ability were known, there would be no incorrect classifications. 
Unfortunately, true ^spores are impossible to obtain in practice, so 
we have to find a way to minimize misclassif ication. 



There are four kinds of classifications: (1) an examinee's 
true ability 9 is higher than a given mastery level 0 O and the observed,' 
score X is higher than' the cutoff score c, that is A = {9 % and 
x > c}; (2) 9 is lower than 9 0 and x is also lower than c, that is 
B = {9 < 9 0 and x < c}; (3) 9 is lower than 9 0> but x is larger than c, 
F+ = {9 < G 0 and x > c}; (A) 9 is higher than 9 0> but x is lower than c, 
F- = {G 21 9 0 and x < c}. Figure 22 shows these four condftions*. 





A 


6 





9 

,0 



true ability, x: -observed score 

true mastery level 

observed cutoff - , 



Figure 22 Classification Table 



Probability of these events will* he 
denoted by P(A),*P(B)," P(F+) and P(F-) 
respectively 



Millman (1975) and then Novick and Lewis (1975) reported 
the percentage of students expected to be misclassif ied for a given 
cutoff with various numbers of test items. Millman used the binomial 
error model, bu,t Novick and Lewis used the Bayesian beta binorcial error 
model? 

According to Milkman's calculations, the percentage of 
students expected to be misclassif ietf at 80 percent mastery level 
using a 10-item test could be as high as 53* percent. 




88 



\ 



100 



Emfcrick" (1972) and Huynh (1976) considered the loss ratio 
Z of F- to F+ as a means of controlling misclassif ication, especially', 
false advancement. If later instructional units require the know- 
ledge and skill acquired iji earlier units, false advancement will be" 
a problem. The loss- ratio of 10 implies the event F- is ten times \s 
serious as' the event F+. Since F- stands for, the event 'in which a 
student has really mastered the given instructional unit but his/her^ 
observed score happens to be lower than the cutoff, retaining such a 
student in the same unit is not efficient. If the instructional units 
are^fairly independent from erne ti* ^another, as, are lessons in the 
Vehicle Training Program at Chanute Air Force Base, then an appropriate 
loss ratio would ±>e 1, or at least it is not necessary to set It as 
high as 10,. ' ' V 

Huynh (1976) proposed' an evaluation of the cutoff score that 
minimizes the occurrence of m'isclassif ications for a given loss ratio*. , 
With his cutoff score, the loss r^tio of having a false positive to 
having a false negative stays the same, say 10, while the IjLneax * 
combination 'of the probabilities of the both events and the loss ratio 
(the average loss)*i.s minimized. We will discuss in more detail Huynh 1 
method in conjunction wi£h 34 Chanute lessons and their MVE test scores 



8.2 Evaluation of the Optimal Cutoff Scores 

* " * ' if 

Huynh derived the optimal cutoff c 0 of a test for a given 
mastery level 9 0 and loss ratio Q so as to nfinimize the average loss 
function R(c) which is the following linear combination of the prob- 
abilities of false positive and false negative : 



R(c) ,= P(F+) + Q P(F-) 



It turns out that c 0 is the smallest integer such that the incomplete 
beta function Ig (a+c 0 V n+b-c 0 ) is smaller than or equal to Q/(l+Q); 
where 



P(c 0 ) 



= I ft (a+c 



n+b-c ) 
o 



o .a+c - 
•6 o 



a-o) n t b -y 



B(a+e 



n+b-c ) 

o 



In order to apply Huynh' s result to evaluate c 0 , we need the help of 
computer to calculate and plot the values of the incomplete peta 



function for c Q » 0,1,2, 



The PLATO system eases these/steps 



and we can obtain the answer through the program "ctotoff.y Figure 23 
illustrates the procedure -to determine the optimal cutofy c 0 . The 



101 



89 




Figure 23 Determining the optimal cutoff C Q as to minimize misclassif ication 

lesson = MVE201a subjects =76 ' n = 10 

mean = 9.4737 SD = 0.9726 

a = 8.5560 b = 0.4753 



x 21 = 0.53 



102 



ERIC 



parameters a and b are obtained from the mean, standard deviation of \ 
the test and the number of items in the test .(denoted by n) . .Table 30 
shows the values of incomplete beta function Iq (i),at each point 
i = 1,2,.. M n, where a,b ar^calculated from test scores of mve2Qla by 
the formulas ' 



(-1 + ~)M 



-a + - n 

a 21 



V 

The curv£ in Figure 23 is obtained bf plotting the points in Table 30. 
The horizontal lines which are marked by losses 0^5, 1, 2, 1, and 4 in 
Figure 23 help to evaluate the optimal cutoff which minimizes the 
average loss R(c) at c 0 for the partially known loss ratio Q and a 
given true mastery level 0 O . Since the contents of all lessons dis- ' 
cussed in the ChanutefAFB CBE Project deal with independent topics 
across the* lessons ana the lessons are not linearly or hierarchically ■ 
related, a loss ratio of 1 will be reasonable. Note that in Figure 23 
the smallest integer value of i for which the curve P(i) goes under 
the line of loss ratio 1 is 7. Therefore c Q = 7 is the ideal cutoff 
score of the test, mve201a. * 



Table 30 
Ten Points in Figure 23 



Item 


a+i 


• n+b-i' 


Ig (a+i, n+b-L 

0 


1 


9.556 


9.475 


0.998 


2 


10.556 - 


8.47.5 


0.991 


3 


11.556" ' 


7.475 


0.969 


' 4 


12.556 


6.475 


0.913 


5> 


13.556 


5.475 


0.796 


6 


, 14.556 


4.475 


0.608 


7 


15.556 


3.475 


0.376 


8 


16.556 


2.475 • 


0.169 


9 


17.556 . 


1.475 


0.045 


10 


38.556* 


0.475 


0.004 



G q = .80, Test = mve201a, a = 8.5560, b = 0.4753 



i0f3 



It is interesting to note thatMffe* cutoff, score, c=8, actually 
used for rave20la in the Chanute training program gives a slightly 
larger value of the probability of misclassif ication R(c) = P(F-f) + P(F-), 
where Q=l ( than the theoretically derived c Q does, but not for P(F+), 
probability of false positive, or,P(F-}, probability of false negative 

separately* ; - 

* t 

Thte probability of event B in 7 Figure 22, P(B) = P(6 < 0 O , 

x < c) can be expressed by a linear combination of beta functions and 

incomplete beta functions,, because . . 

P(B) = J ' J P(6)f(x|6)d6dx = J f 6a ~ I J 1 "!! b " 1 ('")e X (l-e) n " X dxd8 

= B(a b) £ (")B(a+i, b+n-i)I e (a+i, b+n : i). 

' i=0 - O ' 



Similarly, 

V 

P(F-) = P(0>0, x<c) = P(x<c)-P(G<0 , x<c) = P(x<c)-P(B> 



where 



f ( )B(a+x, n-x+b) c-1 
P(X<C) = K<c dx = MaTbT .f 0 ( i )B(a+t; ' n+b - i} 



P(F+) = P(6<9 , x>c) = P(0<0 ) - P(B) 
o, — o 



where 



P(0<0 q ) = I Q (a,b)^ 

, o . 

Thus,, we obtain the following calculation formulas*for P(A), P(B), 
P(F-) and P(F-f), 

i 



92 



< 



9 

ERIC 



c-l 

P(F+) = I Q (a,b) - B(a i ^ )) E -q)B(a+i, b+n-i)I Q (a+i, b+n-i) 
o ' i=0 o 

» . c-l 

= '77rTT E (?)B(a+i, n+b-i)(l-I fl (a+i, b+n-i)) 
B(a,b) i=Q ^i e Q 

, c-l * 

P(A) = 1-1 (a,b) +,,77^ E (")B(a+i, n+b-i) (l fl (a+i, b+n-i)-.l) 
9 o B(a,b) i=0 1 9 o ' 

c-l 

P(B) = B(a b) E (")B(a+i, b+n-i)I Q (a+i, b+n-i) 
' i=0 o 



The probability of each tnisclassif ication for all available 
Mastery Validation Exams were calculated and summarized in Table 31 . 

Since the sum of the probabilities A, B, F+ and F- is 1, the 
sum of .the probabilities of A and B must have the maximum value at c 0 
where the sum* of probabilities F+ and F- reaches the minimum. S ince 
mastery and non-mastery status of examinees are actually determined by • 
the observed cutoff c, the probability, P(x >^ c) is the probability 
of the observed mastery status.^ Column 6 in Table 31, headed by 
P(A or F+) y . is the estimated probability of passing the mastery criterion 
'judged by the observed scores using cutoff c and cutoff c 0 respectively. 
The success rates in Column 7 are the actually observed percentages 
of examinees who achieved mastery level, i.e. who obtained scores 
greater than or .equal to c. Also Table 31 indicates that the actually 
use/1 cutoff scores c produce higher probabilities of miselassif ication . t 
than the theoretically yetermined cutoff c 0 s except in a few cases. 
Since 'the theoretical cutoffs are determined so as to minimize the 
average loss R(c), in our case the si^m of probabilities of false nega- 
tive F- and false positive. F+, all values in Column 6 of Table 31, 
P(F+ or F-) are smaller for c 0 than for c. The sum of the probabilities 
of A and F+ is the expected success rate, so this sum matches the 
observed success rate given in the last column fairly well. If c 0 
were used a£ cutoffs for MVE test scores, only 12 lessons would 
have a probability of observed success less than .90, while 20 lessons 
have values of P(A or F+) less than .90 when c ! s are used. 

SiSicg the probability of false negative, P(F-) stands for 
the case, that an examinee really mastered the goal of instructional 
unit but his/her observed score happened to 0 be lower than the used 
cutoff c, he/she does not really have to repeat the instruction. If 
efficiency of training *in terms of shortening the training time is the 
main concern, then P(F-) should not be so large. For example,. MVK207 

93 " , 
105 V 



Table 31 



Estimated Probability of Misclassif ications 



Test 



a - f Success 

Cutoff P(F^) P(FJ* P(F + or F_) P(A or F + ) rate'. 



mvel03 



mvel04a 



> 6 

c 7 



0.0621 
0.0314 



0.0162 
0.0639 



0.Q783 
0'.0953 » 



0.9247 
0.8462 



C 0 7 
c 10 



0. 0026 
0„0011 



0.0001 
0.0057 



Q.0026 
0.0068 



0.9997 
0.9927 



.89 



,94 



mvel04b 



n<> 9 
C 9 



0. 0348 
0J^0348 



0.0259 
0.0259 



0.0606 
0.0606 



0.8705 
0.8705 



.86 



mvel05 



mve201a 



mve20lb 



c Q 6 

C 7 



0.0235. 
(T.0123 



0.0094 
"0.0399 



0.0329 
0.0522 



0.9739 
0.9323 



C 0 7 
c 8 



0. 0357 
0.0238 



0.0064 
0.0262 



0.0421 
0.0499 



0.9788 
0.9472 



.88 



■:90 



C 0 7 
c 8 



0.1078 
0.0710 



0.0146 
0.0556 



.0. 1223 
0.1266 



0.9375 
0.8598 



72 



ive202a 



c Q 16 
c 16 



0. 1163 
0. 1163 



0.0624 
0.0 £24, 



0.1788 
0.1788 



0.6495 
0.6495 



.82 



mve2 



0^1 



C 0 5 
c 8 



0.0055 
0. 0031 



0.0Q01 
.0,0122 



0.0056 
0.0153 



0.9998 
0.9853 



98 



mve204 



C 0 8 
c 8 



0/0996 
0.0996 



0.0503 
0.-0503 



0.1499 
0.1499 



0.7803 
0.7803 



194 



mve205a 



mve205b 



mve206a 



C 0 8 
<= 8 



C 0 8 
c 8 



0. 1428 
J)_. 1428 



0.1341 
0.1341 



0.2769 
0.2769 



0. 3612 
0.3612 



0.1507',. 0.0634 0.2141 



0. 1507 



0.0634 0.2141 



0.6913 
0.6913 



c Q 10 
c 11 



0.0478 
•0.0266 



0.0184 
0.0535 



0..0662 
0.0801 • 



0.9207 
0.8644 



.79 



.82 



..82 



mve206b 



mve206c 



mve207 

.erJc 



c o 12 

c 14 



0.0606 
0.0305 



0.0113 
0>.0911 



0.0719 
0.1216 



0.9708 
0.8608" 



c Q 13. 
c 16 



0.0057 
0:0030 



0.0003 
0.0116 



0.0061 
0.0146 



0.9991 
0.9852 



'O 



5 
4 



\ 0.2878 



0547 



0.2^22 
0.3425 



0.3070 
0.6393 



.82 



.95 



.91 



fable 31 (cone.) 

3 



s Success- 





Cutoff 


P(F.) 


P(F_) P(F + or F_) 


P(A or 


F' ) rate 


mve30i. 


c_ 


8. 

•> 


.0 .0894 


0.0540 


0.1434 


0 .8184 


.79 




* c° 


8 


0 .0894 


0.0540 


0.1434 


0 .8184 


* 




c_ 
0 


15 


0 . 1070 . 


0.0266 


0. 1336 


0 . 8867 


. 9P 


5 : 


c 


16 


0-. 0730 


0 . 0653 


.0. 1383 


0 .8140 




mve304 




8 


0 . 047.1 


0.0292 


0.0763 


*0 .8922 


.82 






8 


0 .0471 


0..0292 


0.0763 • 


0.8922 




mve3JD5 


c^ 


5 


.0 .0632 


0V0036 


0.0668 


0 . 9827 


. 96 




- o 


"7 


0.0247 . 


0.0787 


0.1034 


0 .8691 




______ o m 


r 


11 


0 .0526 


.0.0056 


0,0582 


0|9797 
0 .9553 


.81, 




" c o 


12 


0.0413 


0.0187 


0.0600 




mve 3U8 




7 


0% 07 32 


' 0.0147 


0.0880 


0 . 9601 


.§3 




•c° 


8 


0 . 0498 


0. 057.8 


0.1076 


0 .^936 


mve4Ul 


c^ 


7. 


0 .0364 


0.0109 


0.0473 


0 . 9872 


.83 




c° 


8* 


0.0252 


0.0451 


0.0704 '. 


0 .9328 




mve402 




13 


0 . 1494 


010395 


0.1890 


0 . 7809 


* .79 




So. 


14 


u • u y i u,. 


0.0961 


' 0. 1871 


U . DObU 




mve403 ; < 


=°. 


8 


0.0771 3 


0.0294 


0.1065 


0.7048 


.79 . 




8 


0.0771 


0.0294 


0.1065 


0.7048 




mve4Q4 . 




3 


0.2100 


0.0130 


0.2230 


0.9564 


1 .00 




3> 


4 


0.1455 


0.0840 


0.2296 


0.8208 




mve405a ' 




'6 


0.0560 


0.0025 


0.0585 


0.9919 


1.00 




• c° 


8 


0.0326 


0.0513 


0.0839 ■ 


d.9196 




mve405b 




8 


0.0987 


0.0419 


0.1405 " 


Q.7344 


-.91 






8 


0.0987 


0 . 0419 


0.1405 ■* 


0\7344 




mve405c 




7 


0.0794 


0^0123 


0.0917 


0.9543 


.94 




' c° 


8 


'0.0527 


0.0478 


0. 1005 


0.8921. 





a * - * 

c q is the theoretically derived cutoff to minimize 

P«(F + ) + P(F_) . c is the cutoff actually us-ed in the PLATO Service 

Program at Chanute. * . , 



i 



has P(F-) = .1^57 so'that to 88 x 0.1957 or 17 out of a total of 88 % 
students repeated Jfe same instruction unnecessari ly ^ Of course this 
;Ls an extreme case" mid most p values are less than .10 percent, .which „ 
meaifs that five .to eight students repealed the same lesson mistakenly. 
Table32 shows the 'number of students who will be misclassif ied or „ ■ 
were misclassif ied. . m \ 

We conclude that most putoffs of Master Validation Exams 
used at Chanute r were not the be§t^jchoice. By adopting the ..theoretically 
derived cutoff «c 0 f s the probability of misclassif ideations could have 
>^err-minimized. Note that P(F+) at cutoff c 0 foj each MVE excepWfor 
Mv'E207 (which has c 0 larger than c; while 6thers has the reverse) 
becomes larger than or equal -to the value of P(F^) at cutoff c, while 
P(F-) showed the reverse phenomenon. The appropriate judgement of 
which misclassif ication should be minimized, must he made by "a te£t 
administrator through- deciding on tfce loss ratip Q. We set Q=l because 
all lessons were consideredi to be no.t related linearly.. We have to face 
the problem of how to put weights on the cases, the increased chance 
of having students advance by mistake and decreased chance of retaining 
students unnecessarily in the lessons they just finished or the reverse 
If a training program must' be finished' in a hurry, then it is better 
to sefQ so as to minimize the chance of false retainment, P(F-). 
Thus, Huy^h's method gives us more control over the situation, b,ut 
also brings in more complications of judgement. ' We don't know how to & 
make the best judgement on the issues^ what le.vel a mastery, criterion 
should be set at, and how large the loss ratio Q should, be." Neither 
decisions can be made analytically or in a logical^^. Only carefully 
designed, experimental" research can answer what arel^H^-best decisions. 



Let us examine Huynh's method more carefully-. Figure 24 
shows similar plot-tings to Figure 2 J but the time mastery levels of .70, 
.75, .85, .90 were- also plotted together with ..§0 on Che same screen. 
Ttie dotted lines were marked by the level of mastery respectively^ 
The horizontal lines correspond to various^loss ratios', ,50, 1., ; 
1.5,..., 20. In*Figure 24,*tfie optimal <;utoff c 0 at the mastery level 4 ' 
of .80 is *9 with the loss ratio of 1.00. c 0 =9 can be the optimal, 
cutoff at the mastery level of .85 with the loss ratio o£ Q=2, anjd 
also at the mastery level of, .90 with Q=2.5. Indeed., the ranges of Q 
for c 0 =9, at 80 percent is 'from 0 to 1.2, for c 0 =9^at 85 percent is' 
from 0.8 to 3, for c 0 =9, 90 percent is from 2.25 to 9.25. In the last* 
example, a choice of loss ratio between 2.25 and 9.25 will lead ut to 
select c 0 =9 at the jna^tery level of .90. Figure 24 shows' that the • 

' range of loss ratio Q for c 0 =8 and the mastery level of .90 becomes 
from 9.25 to over 30. The average loss P(F+) + Q P(F-) associated; with 
Q=9.25 and 30 will be ^uite different, but P(F-), P(F+) .are determined * 
uniquely with e 0 =9, and "the mastery -level 0 O ».9. Test administrators 

•Will need more guidance to decidfe the best loss ratio for .their testing. 

. . • . - 



96 



108 



•Table 32 

Estimated Wmber .of Misclassif ied' Students 



Te$t / 


Cutoff 3 




• F_ - 


Test 


Cutoff 3 




F_ 


mvel03 . 


u 

c 


6 
7 


5.3 K 
2 . 7 


k 1.4 


mve207 


'c° 


5 
4 


8.5 

9 ^ ^ 


17.2 

A ft 


mvel04a 


u 

c 


7 
10 


0.2 

0 . 1 


\0.0 


mve301 


c 


• 8 
8 


6.9 


4.2 

" 4 '2 


mvel04b 


u 

c 


9 
•9 


2.9 
2 . 9 


2L1 


mve303 


c° 


15 
16 


, 7 : 6 

O . Z 


1.9 
d ft 


lnvel05 


0 

c 


6 
7 


2.0 
1.0 


0.B 


.* mve304 

* * 


- c 


8 
8 


3.2 

J . Z 


2 t .O 
9 n 

z • u 


mve201a 


c n 
c° 


7 
8 


2.7 
1.8 


o.a 

z . 0\ 


mve30§ 


u 

c 


5- 
7 


4.5 

1 P 

X . o 


0.3 

* ^ 7 


mve201b 


u 

c 


7 
8 


9.3 
6.1 ■ 


1.3 \ 
4.8 V 


mve307 


C 0 

'C 


11 
12 


3.3 

9 A 
Z • O 


- 0.4* 

1 9 
X • z 


mve202a 


0 

c 


16, 
16 


11.3 
11.3 


6.1 1 
6. 1 


i mve30JB * 


U 

c 


7' 
8 


6.0 

*± . X 


1.2 

A 7 


jnve202b 


u 

C 


5 
,8 


0.5 
0 . 3 


0.0 

1-. U 


\ Tnve401 


c o 


»« 
8 


2.3 
1 ft 


• 1 0.7 

9 ft 


JHft£e2Q4 v 


c° 


8 

' <8 


8.& 
ft ft 


4. .4 


\ mve402 


-9 


. 13 
14 


14.0 
8 . 6 


3.7 
9.0 


,mve205a 




8 
8 


12.9 
12.9 


12.1 

: 1 2 . 1 


\ mve403 

\ 


c o 

C 


8 
8- 


6.5 
6.5 


2.5 
.2.5 


mve205b 


c 


8 
8 


12.4 
12.4 


5.2 
5.2 


r mve404 
« 

— 


c o 

C 


3 

• 4 


14.1 
9.8 


0.9 
<5.6 


mve206a 

*** 


■ >' 


10 
11 


3.7. 
2.1 


•1.4 
4.2 


mve^OSa 




6 
8 


4.0 
2.3 


0.2 
*3.6 


pve206b 


c 


12 
14 


4.8 ■ 
2.4 


0.9 
7.3 


mve405b 


c 0 
c 


8 
8 


6.9 
6.9 


2.9 
2.9 


mve206c 


• c o. 

c 


13 

16 


0.4, 
0.2 


0.0 
0.8 


mve405c 
— A — 


- c o 

c 


7 
8 


5.5 
3.6 


0.8 
3.3 



a c is the theoretically derived cutoff to minimize P(F ) + P(F ). 
o 

c i£ the cutoff actually used in the PLATO Service Progrnra^at Chanufe. 

ERJC 103 



1.0 T 



P " 



.0 . s 



0. 6 



0. 4' 



0.2 



0 . JO 



« • • s i i i ; j i i ■ y i 



.90 



I i i j it! 



5. a 

3.50 

:.50 ' 



-I. 



00 
50 



-1.00 



HJ.50 



1 i I 1 I 1 



I 1 t I I I I 1 1 1 I I I 1 \ I .1 I I i i i 1 I 



5 



10 . 11 



Figure 24 T optimal cutoff for the mastery levels of .70, .75, .80, .85, and .90 

n = 11 



lesson = mvel04b 
mean = 10. 1205 
a-= 2.5012 



subjects = 83 
SD = 1.7278 - 



110 



.A174 



a 21 = 0.8018 



9 

ERIC 



8.3 Other Measures Obtained fr/>m the Evaluation Study of the C hanute 
AFB C BE Pro ject ' 

, J 

Correlation Values of Mastery Validation Exams Score s wijih 
Block Test Scores and Gain Scores . The evaluation study of the program, 
supported by the Advanced Research Projects Agency, measured some 
cfiterion variables which wQuld be helpful in conducting a validation 
study of MVEs. The evaluation study revealed that a substantial number 
of examinees were misclassif ied (Table 32). Sin,ce detailed information 
on the design used in the evaluation study* can be found in Dallmen et al. 
(1977), just a brief description will be given here. 

A 50-item NRT was given at the / beginning and the end of the 
eight-week Chanute Project, which included 35 on-line lessons. % The 
35 lessons were divided into four subsets called Blockl, Block2, Block3, 
and Block4. After a student studied and mastered all lessons in a 
block, he took the ^lock test; the block test spore was counted in his 
final grade for the courset He had to take all four block tests, and 
then. a posttest was given in order to measure the effectiveness of 
the program. Each block test had twenty items which were either 
multiple-choice or matching.' The coefficient alpha reliabilities were 
not calculated because the tests were written on the PLATO system and 
the item ^information was not collected..* But OL21 was available in tfie 
following chart. Figure 3 gives a flow chart of the testing program. 

In order to validate, the effectiveness of lessons, four kinds 
of correlations* were calculated. These correlations are described in 
the following paragraphs. 

Each Block's test scores were matched with the corresponding 
Master Validation Exam ^scores and the time needed to master the lesson 
(mastery time),/ and their correlations 'were calculated over the sub- 
jects. These tfwo correction values of 27 lessons w§re denoted by * K 
r (3, MVEs) and r(B,time) 'respectively . Their. values are shown in Table 33, 

The true gain scores of posttest, from pretest, x^, 

were estimated by multiple regression procedure; the true score 
difference t2-t^ of the observed score difference *2~ x l was regressed 
on the post- and pretest s-cores. It is known that regression of 
t 2"" t l onto the two variables x^ and X2 is the same as regressing t2~ti 
on the scores X2~ x l and the residual score, C2> of X2 on X2~xj 
(Tatsuoka, 1975), because the coyariance of X2~ x l anc * c 2 ec l ua l s zero 
and both X2-x^ and C2 are linear eombinaftions of x-^ and X2» Therefore, 
the multiple regression R(t2~ t^ j X2-x-^) will be given as the sum of 
the regression of R(t2~t^ ^-x^) and R(t2-t]_ | C2) s 

R(t 2 -t 1 |x 2 , x : ) = R( t 2" t l' X 2"" X l^t R ( t 2*" t il c 2^ t 



99 

ERiC 111 



Table 33- 

The Correlations of Block tests to MVE Scores and Mastery Time 



lesson 


r(B, MVEs) 


r(B, time) 


' r(G, MVEs)^ 


r(0, time) 


103 


.15 


-.22 


.23 


-.38* 


104a 


.38* 


-.33* 


.19 


-.43* 


104b 


.36* 




.44* ' 


..... 


105 


.22 


-.08 


.20 


-.34* 


201a 


.34* 


•12 


.44* 


.-.05 


201b 


.19 


-.25 


.38* 


-.40* 


202a 


.17 


-.04 


.07 


-.43* 


202b 


.26 


-.03 


.28* 


-.07 • 


204 


.21 


-.21 


.11 


-.13 


205a 


.28* 


• -.24 


.18 


-.32* 


205b 


'.25 


-.08 ; 


.15 


-.26 


206a . 


.40* 


-.21 y 


.13 


-.22 


-206b* 


.12 


- . 04 


-.02 


-.18 


206c 


.00 


-.04 


.33* 


-.08 - - 


207 


• .28* 


-.17 


.25 


-.27 


301 


.04 


' -.08 . n 


-.11 


-.06 


303 


• -34. 

> 


-.21 


.08 


-.05 


304 


.38 


-.27 


.42* 


i 

-.37 


305 


.07* 


-.19 


.31* 


-.26 * 


307 


.30* 


-.23 


.41* 


% -.30* 


308 


.01 


.04 • 


.00 


-.07 


401 


.50* 


-.15 


.32* 


-.21 


402 


.25 • 


-.14 


.46* 


-.34* 


403 


.40* 


-.23 


.21 


-.'02 


404 


' -.02 


.00 


.02 


-.33* 


405a 


.07 


.01 


• .12 


-.11 


405b 


.25 


-.06 




-.12 


•405c 


.37* 


-.11 


. • .19 


-.07 ' 


*signif leant 


at p < .05. 


"> 




i 


* 


•* 


10 ? ' 




* 



112 



Note* that the regression coefficient of the first ten* is * 
the reliability of gain scores and that of the Second term* is the 
increment of multiple R 2 . The multiple R is ,861/ hence the reliability* 
of the multiple regression gain' score is r2=,7405. The squared multiple 
R of the first term, 'viz, the reliability of X2~x^ ? is .1047, '"The 
squared multiple R of the second term is the increment .6358. 

This estimated gain score has a higher reliability than those 
of pretest and posttest separately. This score was* correlated with^MVE 
scores and mastery time. Table 33 shows the tesiilt. The numbers of 
statistically significant correlation values are 12 in Column 2, I in 
Column 3, 10 in Column 4, and 10 in Column 5, 'The correlation matrix 
of these four variables over 27 lessons, r(G, MVEs), r (GT, time) , 
r^B, MVEs) and r(B, time) is as follows: 



1. 
2. 
3. 
4. 



r(G, MVEs) 
r(G, time) 
r(B, MVEs) 
r(B v time) 



I 



1.000 
-.377 
.403 
-.235 



. 4 



1.000 
-.275 
.520 



1.000 
-.468 



1.000 



Variables 1 and 3 have a moderate correlation value, and 
Variables 2 and 4 have also a moderate correlation value of .520. 
The reliability of our gain -score has the value of .74 while the four 
Block tests in Figure 3 have the reliability 0*21 °f '56j § 33, .47 and 
.42 respectively which are very low. Therefore, we decided to use 
only the first two variables', r(G, MVEs) and r(G, time) in subsequent 
analyses. They were renamed "gain" and M timeg." 

The optimal cutoffs c n that ,were evaluated in the previous- 
subsection, and designated by c 0 in Table 32, were divided by number of 
items of the corresponding Master Validation Exam. 



The distance of c Q frotn the mean value in each test , c 0 -x, 
was also divided by the number of items of the corresponding Master 
Validation Exam in order to make it free from the effect of the test 
length of MVEs, and then absolute values were taken. This vAlue 
stands for a sort of the distance of c 0 from t.he mean of each! test. 

\ 

A lesson of Vehicle Training Program at Chanute Ait Force 
Base was said to be validated when 90 percent of the students ^iave 
achieved the^given mastery level of 80 percejtit of the items ankwered 
correctly in the first attempt on each Master Validation Exam. ' The 
sample consisted of about 30 students from successive classes. No 
^ major modifications of lessons were made until all students in the 
sample finished the lessons. All lessons were Validated according to 
this criterion .between April and September of 1975. These lessons were 



101 



113 



used without any major change during the evaluation period and were 
tested on more % students who came ifl^fter the validation dates were 
establishe'df\ Table 34 includes the^LStf ormation of validation data, 
the number of > examinees who studied the lessons after the lessons were 
said- to be validated (we call this number "nafter" from now on), thefl 
percentage of students who achieved the given mastery level at the frtst 
try (denoted by % of success), the* percentage of students who failed 
at the first jtry, the total number Df students (which is equal to 30 
plus "natter 1 ') and the number of students who passed the end of the 
lesson test at. the first try. / 1 

Efficiency Index in the last column in Table 35 (see -page 
105) is aimed at measuring the quality of Chanute, lessons . It is 
derived from the idea that a good lesson written on the CA1 system 
will allow a student to spend his/her minimum time to mast'er. the 
instructional objective. If a lesson is not good, then a student 
tends to spend more time than he/she actually need* to master the same 1 
instructional goal in a good lesson. The reader might wonder what is 
the definition of a good lesson. The experienced instructional designer 
might say that the quality of instruction may be determined by the 
appropriateness of instructional cues, and the quality and the t^p^ 
of reinforcement given each student, as well as the amount of partici- 
pation and practice* experienced by each student,. If the instructional 
cues ^re appropriate, clear without ambiguous wording or explanation, 
then a student must learn the instruction at his/her own learning rate 
without wasting his/her time. 

Carroll (1963), Carroll and Spearitt (1967), and Atkinson 
(1968), studied the various relationship^ among the quality of instruc- 
tion, intelligence and £ime required for 'each student to achieve the 
mastery. Atkinson's findings are especially interesting. They show 
that students can achieve mastery leve] of different tasks with 
different rates ancj that time variations in learning can be reduced by 
improving the quality of instruction. Indeed, high quality lessons 
maximized the individual's learning rate. We all know that a bright 
student learns very quickly, no matter how poorly a lesson is written. 
It seems likely. that a mediocre student will bethe one who suffers the 
most from ambiguous, unclear instructional cues in'a poor quality 
lessort. *If the teaching objective in a lesson does not require pre- ' f 
viously acquired knowledge or high intelligence, and is fairly easy, then 
average students should master it as quickly as bright students master 
it. • 

How to measure the quality of a lesspn became a major concern 
in the evaluation study of Chanute Air Force Base Computer Based Educa- 
tion Project. They~*£rifed to validate a lesson by using success rate 
(see Tables 31 and 34),' but their attempt was not successful. It is 
natural to consider that the quality of instruction can be recognized 
at least by two aspects; one v is higher success rate, the othVr is 
faster learning rate. 

]02 * . - 

U4\ 



, • . J ■ • Table 3^ 

Summary, of Master Validation Exams In the Chanute PLATO IV Project 



Lessons 


i 

*!* " 


Validation 
Date 


Size of teste< 
out sample / 


{ % of • 
Success: 


% of 
.Failure 


Total 
N 


// of 
Success 


103 f 




10 June ' 


63 / 


89% 


11% 


93 


83 


104a 


30 


14 April 


114 " / 


94% 


6% 


144 


134 


104b 


1 30" 


14 April 


113 / 


86% 


14% 


143 


124 


• 105 


30 


14 April 


102 / 


88%. v 


^12% 


132 


117 


106 




19 June 


r 


82% 


18% 


63 


54 


201a 


30 


28 May / 

1 


9k 


90% 


10% ' 


129 


116 


201b 


30 


23 May 


109 


72% 


28% 


139 


105 


>202a 


30 


IB Aug 


33 


82% 1 


18% 


63 


54 


202b 


30 


28 May / 


90 


98% 


2% ' 


120 


115 


"203a 


30 


28 May ; 


33 


97% 


3% 


63 


59 


203b 


30 


13'Jun£ 

r 


33 


94% 


6% • 


63 


58 


• 203c 


30 


18 Aug 


33 


91% 


9% 


63 


57 


204 


30 


18 Aug 


33 


94% 


6% 


63 


58 


205a 


30 


15 Jan 


33 / 


79<% 


21% 


63 


53 


205b 


30 


15 Jan 


- 33 C 


82% 


18% 


63 


■ 54 


206a' 


30 


13 June 


90 


82% 


18% 


120 


101 ' 


206b 


30 


25 June 


65 


82% 


18% 


95 


80 


206c * 


30 


11 April 


118 


95% 




148 


■139 


• 207 


30 


15 Aug 


33 


91% 


9% 


63 


57 


301 


30 


25 June 


109 ; 


79% . 


21% 


139 


• 113 
r 

■ 80 


304 


30 


25 June 


65 


, 82% 


18% 


95 


305 ' 


30 


18 May 


109 


96% 


4% 


139 


132 


307 
308 


30 
30 


14 April 
18 flay 


130 
109 


81% 
63% 


.19% 
'37% 


160 
*• 139 ' 


132 
96 


401 


30 


17 ^pril 


• 142 


83% 


17% 


172 


146 


'402 


30 


8 July 


65 


79% 


21% 


95 


78 


403 


30 


30 iJune 


65 ' 


79% 


21% 


95 


78 


404 


30 


2 Sept 


33 


100% 


*5b% 


• • 63 


60, 



a M is the sample stv.d used for establishing validation dates. 



ERIC 



103 
115 



(Table 34 corif.) 







' Validation Size of tested 


%% of 


% of 


... Total 


#* of 


Lessons ' 


Date -v 


out- sample 


Success 


Failure 


• N 


Sueccs 


405a 


30 


26' Aug 


33 . 


100%, " 


■ 0% . 


63 


60 


405b 


30 


26 Aug 


33 V- 


91% 


'■9% 


.'^ 63 


57 


405c 


30 


26 Aug 


33 . ' 


4 34% • 


6% 


63 ' 


58 


405d 


30 


2 Sept' 


33 


: ' ; 73% 


' 2.7% 


63 


51 


406 


30 


30 June 


65' 


95% 


5%, 


95 


89 


407 


30 


22 Sept 


33 


88% 


12% 


63 


56 




116 

104 



Tatsuoka (1978) discussed the possibility of using the success 
rate as. a measure of instructional quality in her paper, and the result 
was not favorable. Success rate measure depends on the scores "on tHe 
end-of-lesson test, a criterion-referenced test which has be$n a ' 
problem in educational measurement It is dangerous to use a criterion-, 
referenced test alone as a measure \i^the instructional, quality , and 
the success rate is contaminated by the problems of misclassifications, 
false positive, and false negative. It is urgent to* establish a methbd 
that can measure the quality of instruction directly without using 
criterion-referenced testing as an auxiliary means. We believe our 
efficiency index provides one such wanted measure. The' procedure for 
deriving the efficiency index is as follows.- 



1. The total sample of about 80 subjects, was divided into 
three groups according to their scores on t:he ^aptitude "-test, the Armed 
Services Vocational Altitude Battery (ASVAB) . ' The t£st is aimed ?t 
measuring general-technical, mehcanical, motor mechanical hnd elec- 
tronics aptitudes for high school seniors, as pa.rt the recruiting 
programs of the Army, Navy, and Air Force. The first\ group consists 

.of the top 25 percent of the students, the second is v\ie middle 
50 percent of students and the third is the bottom 25 percent of the 
students who took the ASVAB. The average mastery times of the three 
groups are calculated and summarized in Table 35. The t-test of mean 
mastery, times for the two groups, Group 1 and Group 2, revealed that 
9 out of 27 lessons were statistically significant at p < .05. 
« 

2. Lesson MVE201a was arbitrarily picked as the base, and 
its mean mastery times in Groups 1 and 2 were divided by the respective 
mean values of mastery time of every other -lesson. We calculated such 
ratips of the mastery time of 27 Chanute lessons in Groups 1 and 2, 
taking the mean mastery of lesson MVE201a as the base. 

3. According to the assumption that a good lesson will not 
make the average students slow down to master it in compari-son with time 
taken to master the ^lesson for the brighter students, we divided the 
newly calculated 27 ratios, [mean mastery ^time of MVE201a] / [mean mastery 
time of lesson X], in Group 1 by the corresponding ratios in Croup 2, K 
and obtained 27 efficiency indices which. appear in the last .column of 
Table 35. ■ « . 

"If the value of (efficiency index of lesson A"is larger than 
that of lesson B, then w# might be able to say that lesson A is more 
efficient than lesson B.- - ' 

\ - 

8 . A The Results of Statistical Ana]yses Over 27 Chanute Lessons ' 



UC 



. Nineteen measures were selected and their correlation matrix 
was calculated. Table 36 gives a brief description of 19 variables .md 
Table 37 is the correlation matrix of these Ateri;jbl|s . 



if? 



4 




Average Mastery Time, and Efficiency Index 



Mean and Standard Deviation (Minutes) 



Lesson 




* 

1 


ml 


T 


** 








*** 


,b£ f iciency 
Index 


MVE103 


21.25 


, 5.26 


32 


95 


, 15 


42 


43.83 


, 27.1,1 


0.746 


MVEl04a 


23.42 


, 5.09 


- 36 


.82 


, 13 


.61 


36 


.25 


, 13.07' 


0.736 


MVE105 


31.73 


, 8.19 


41 


63 


. 12 


32 


54 


.43 


, 23.00 


0.882 


MVE201a 


11.20 


, 5.35 


12 


96 


, 6 


31 


N 13 


.75 


, 7..19 


1.000 


MVE201b 


27.08 


, 16.02 ' 


42 


46 


, 23 


05 


52 


.42 


, 29.05 


0.738" 


MVE202a 


^142.23 


, 56:22 


183 


44 


, 73 


65 


218 


.14 


,114.81 


0.897 


MVE202b 


12.46 


, 3.*9". 


14 


58 


, 4 


46 


14 


.25 


t 3'. 4.1 


0.989 


MVE204 


- 71.64 


, 31.36 


100 


76 


, 59 


91 


102 


50 


, 60.03 • 


0.823 


MVE2fl5a 


' 86.75 


, 25.32. 


111 


60 


, 47 


97 


149 


17 


, 94.90 


• 0-899 


MVE205b 


27.90 


, 11.80 


44 


46 


, 35 


94 


50 


18 


, 29.35 


0.726 


MVE206a 


37.89 


, 12.75 


53 


10 


, n 


77 


. 55. 


00 


, 26.03 


0.825 


MVE206b 


11.00 


, 3.20 


20 


33 


, 13 


50 


• 22 


50 


, 7 .15 


0.626 


MVE206c 


33.13 


, 15.50 


50 


95 


, 33 


75 


41 


•78 


, 10.69' 


0.732 


MVE207 


22.45 


, 5.05 


34. 


50 


, 16 


18 


' 43 


15 


, 26.86 


0.753 ; 


MVE301. 


26.67 


', '9.10 


29/81 


, 18 


81 


29 


92 


, 16.98 


1.035 * 


MVE303 


11.57; 


, . 3.99 


13. 


50 


, 7 


06 


15 


36 


, 6. ,99 


0.992 


MVE304 


12.83 


, 7.25 . 


10. 


06 


, 4. 


99 


15 


80 


i 7.71 


1.476 


MVE305 


14.75 


, 3.41 


- 19. 


90 


, 9. 


01 


21 


80 


, 7.15 


0.858 , 


MVE30i 


44.00 


, 12.54 


58. 


22 


, 27. 


57 


86 


83 


, 22.74 


. 0.874 


MVE308 


38.00 


, 5.95 


, 44. 


71 


, 18 


66 


42 


10 


, 15.77 . 


0.983 


MVE401- 


17.00 


, ' 3.27 


21.06 


, 5' 


50 


26. 


.17 


, 5.42 


0:934 


MVE402 


53.55 


, 18.91 


81. 


69 


, 67. 


17 


114. 


08 


, 49.75 


0.758 


MVE403 


7.13 


, 1.13 


104 


38 


, 16. 


34 


15 


86 


, 8.73 


0.666 • 


MVE40'4 


10.20 


6.14 


Ao. 


00 


, 5". 


13 


13 


44 


, 7.50 


1.180 - 


MVE405a 


, 23.00 


,"l4..10 


25. 


37 


, 7. 


21 


32 


60 


, 13.82 


1.049 


MVE405b 


33.25 


, 9.16 


42. 


'47 


, 18. 


37 


39 


11 


, 19\33 


0.906 . 


MVE405c 


9.00 


, 2.38 


11. 


10 


8 


51 


. 13 


00 


, 5.29 


0.938 



The top 25 percent of examinees according to ASVAB scores. 
, ♦ 

The middle , 50 percent of examinees according to ASVAB scores. 

** - 
The bottom 25 percent of examinees according to ASVAB scores. 



Table 36 " 
A Brief Description 9 oi 19 Variables 



Variable' 
Number 



•Notation 



Descr ipt Lon 



1 


P(F+) 


2 


c 0 /n 


3 • 


a 2i 


4' 


*P(F+) + P(F-) 


5 


naf ter 






6 


% fail 


7 • 

* 


t o 


8 


mc 


9 




10 






P 






12 


range 






13 


efficiency index 


14. 


'gain 


15 

r 


time g 


16 


items 


17 


"fc 0 -mean| /n 


18 


P(F-) 


19 ' 


P(A or F+) ' , 


i. 

t 





7 



The probability of false positive 
The optimum cutoff ratio so as to minimize 

misclassif ications 
The ratio of true variance to observed 

variance 

The probability of misclassif ication 
The number of subjects tested after a 
lesson was declared to be validated 
Observed percentage of failure ^Ln MVE 
The minimum time parameter from Weibull 

distribution - 
Maximum correlation from estimation pro- 
cedure of Weibull parameters 
Shape parameter of Weibull distribution 
Scale parameter of Weibull distribution 
Probability value from Kolmogorov-Smirnov 
test 

Maximum mastery time minus minimum mastery 
t ime 

Relative ratio of mean mastery time of 
higher aptitude group to mediocre 
aptitude group t r ~ *' 
Correlation of gain scores with MVE scores 
Correlation of gain scores with mastery 
time 

* Number of items in a test 
Relative distance of c Q from the .mean 
Probability of false negative 
Probability of pass based on the observed 
cutoff c 



The probability of false positive (or advancement) P (F+) 



has correlation values of .931, -.562, -.678,. .638 and -.'637 
pfF+) + P(F-), naf ter J c O" mean l , p(' F -) and P(A or F+) respectively. 
According to these correlations,* when^false positive occurs, then 
false negative more likely occurs but the observed passing rat^e, 
P(A. or ?+) more likely declines* That means that the lessons .wh.ose 



\ 



t f 

Table 37 

Correlation Matrix of 19 Variables 
(x 1000) 



\ - 



! 


\ 

* L c 


r n 


4 

-6 


4 s 5 

. 931-562 


0 

.111 


7 

1 47 


-345 


9,. 
-276 


- JL 

342 




25C 


1 0 3 0 


353 


, 3 93 


.-373 


167 


754 


259 


: 10 


614 


3 


- n 


358 


1 00.8 


-20 


-37 


%M 


158 


44 


-236 


196 


4 


931 


393 


-20 


1000 


-617 


165 


257 


-223 


-294 


396. 


5 


-562 


-37 3 


-37 


-617 


1000 


•335 


-261 - 


. 285 


337 


-'-329 


6 


1 1 1 


167 


• 3 84 


" I 65 


335 


1 000 


225 


1 


-20 


254 


7 


147 


754 


158 


257 


-261 


225 


100£y 


203 


. 20 


762 


8 


-34? 


259 


44 


-223 


^85 


' 1 


203 


1000 


389 


32 


9 


-276 


-10, 


-236 


-294 


337- 


-20 


20 


389 


1000 


60 


10 


342 


6 1 4 


1 96 


396 


-329 


25*4 


762 


32 


60 


1000 


1 1 


182 


270 

l 


89 


• 279 


-6 4 


' 76 


318 


295 ' 


-31 


383 


12 


265 


62V 


213 


345 


-304 


206 


755 


67 


-15 


967, 


13 






-145 


-1 


-30 




-167 


-101 


401 


-227 


14 


X£83 


-244 


90 


-264 


271 


32^ 


-90 


-205 


64 


-94 


15 y 


1,83 


-233 


-259 


54 


-99 


-460 


-508 


.-26 


-71 


-449 


16 


V- 1.08 


-271 


"l 72 


-211 


426 


S85 


-106 


98 


43 


43 


17 


-678 


-441 


-626 


-662 


35'3 


-522 


-219 


239 


228 


-440 


18 


638 


542 


, 79 


, 869. 


-544 


293 


376 


' -17 


-234 


428 . 


19' 


-637 


-558 


-189 


-853 


587 


-289 


-426 


12 


298 

« 


-473 




• 

j t 


1 '/ 

± tm 


1 ~ 
* J 


14 


15 


16 


17 


18 


~19 




1 1 


1 MS 


43.8 


-l 93 


-126 


-175 


157- 


-259 


347 


-326 




12 


-43.0 


1000 


-289 


-74 


-414 


• ret* 

*** 


M02 


417 


-467 




13 


-193 


-289; 


1000 


-1 


261 


-319 ' 


"105 


-75 


148 




14 


-126 


-74 


-l 


1002 


-377 


231 


181 


-196 


155 




15 


-175 


-414 


261 


-377 


1000 


-190 


1-19 


-171 


174 




16 


157 


70 


-3 19 


231 


-190 


1000 


-123 


-26,4 


238 




17 


-259 


-402 


105 


181 


119 


-123 


1000 


-595 


640 




18 


347 


417 


-75 


-196 


.-171 


-264 


-595 


1000 


-974 




19 - 


-326 


-467 


148 


155 


174 


238 


640 


-974 


1000 





K , 

108 




120 



J 



* 



is 



observed passing rate?, ,P(A or F+) is higher tend to have less chance 
of falsfe positive (advancement) cases. The test whi % ch advances the 
students to the x next lesson more frequently by mistake 'l^ends alsb to; 
retain tfye student whose true score's are .really abkve^the mastery level. 
The* high correlation of P(F+) and UflZpgnL shows wl ^ en the observed 
cutof-f c Q is clqser to mean, then the misclassif ication of false 
advancement tends to occur more often. The 'correlation value of -.5«62 
with the variable, nafter, the jiumber o£ students who studied a lesson 
after the validation date was set (if over 90 percent of the students & 
pass the mastery level of d MVE, then 'the lesson was s^iijto' be j/alida- 



ted) indicates that the probability P(F+) will be siftall if a the lesstotjs 
whose validation date were established ^at an earlier dat'e during the 
period of evaluation study at PLA^O program. 




This relation is try/ for the variable^ P(F+ or F~) and • 
P(F-) because the correlations of variable "nafter" with them are -.617 
and -,544 respectively. Moreover, P(F+), P(F-) and P(F+ or F-) corre- 
late highly with variable 1 c o" mean 1 wittr -the value*? of. -. 678, -.595, 

and -.662 respectively. But the correlations between "nafter" and 
i q —mean I * 

i_o_ l ^ s not s0 , i 0 u y a t ,353. Further, discussion of the appropri- 
ateness of the procedure that a lesson can.be said validated will be 
found in Tatsuoka (1978). 

| c 0 -meaiil * • 

Variable 17 ( J — ^ — ' — ) correlates significantly with nine 

Variables, and s8 dbes Variable 19 (P(A or F+)). Variable 12, (range) - 

/ ♦ V * 

and Variable 18 \P(F*)) each correlate signif icantlv/tfrith .eight v 
variables. Variable 2 (c Q *) hds seven variables, Variables 4 (P(F+ or 
F-) and 10 (u 0 ) have six variables whose correlation values are signifi- 
£ant. In order, to clarify the characteristics of the 19 variables, 
principal component analysis was Jrirst performed. ' The first five 
eigenvalues were 6.32, 3.08, 2ul2, Jl.40, 1.-30 respectively arnd their 



cumulative percentage was 75 (per/rent of the total variance. The factor 
matrix was orthogonally rotted Dy Varimax analysis and five factors 
* v were selected. Table $8 summarizes variables in each factor with their 

factor loadings. 

* 109 

ERJC • 121 

• • ** 



Table 38 
The Re&ults of Factor Analysis 





factor 1 . 






> 


Factor . 


•2 


Variable 




Loading 




Variable 




Loading 


1 


P(F+) 


. 89' 


to 


2 


c 0 /n 


. „.7o 


4 


P(F+ or F-) 


.94 




7 


c o 


.90' 


5 


nafter 


-.69 




10 


Ho ' 


.85 


17 


| c 0 -mean | /n 


-.73 




12 


range 


.85 


18 


P(F-) 


.81 




15 


timeg 


-.66 


19 


P(A or F+) 


-.81 


1 







Factor 3 



Variable 



Loading 



3 
6 
16 
17 



a 2i 

% fail 
items 

| c 0 -mean | /n 



.62 
.85 
.62 
-.58 



Variable 



Factor 4 



Loadin g 



8 


mc 


-.78 


n 


P 


" -.54 


14 


gain 


-61 



Variable 

9 „ 
13 



Factor 5 



efficiency index 



Loading 

.69 

.84 



Probability Variables 1, 4, 18 and 19 clustered together 
with Variables 5 and JL'7 as Factor \. Time variables 7, 10, 12 and 15 
clustered together with the optimal cutoff c 0 . The result most 
interesting to the authors was Factor 5, the shape parameter of Weibull. 
distribution clustering together with the efficiency index of lessons. 
The correlation of c and efficiency index is ..401 which means that if 
c is larger, then the lessons tend to have larger efficiency index, 
and hence the difference between the average mean time of Group 1 * 




110 



122 



V 



« / 
-ft 

and Group 2 becomes smaller, with respect to .the difference of those 
in fcroup 1 and 2 of lesson MVE201a. . That means by our assumption that 

j if c is larger, then, th%> corresponding lesson is more efficiently 

teaching students. Recall the previously developed argument that CRR 
of larger than 1 was interpreted ^s meaning that students engaged 
themselves with the task of solving^ problem, and £RR of smaller than 

• ^ 1 indicated they gave up a given item because it was too difficult 

to try for < them*. ^ These two results from the analysis of lessons and 
test items were independently derived in different, contexts, and yet 
both make sense, and sound reasonable. Since aptitude scores are in- 
frequently aivaila.ble in common practice, it is usually difficult to 
obtain the efficiency index we introduced in this report. But mastery 
time can be pbtained fairly easily from Wessons written on a CAI ' 
gr System, so our research result w^ll b& used to measure some aspect, 
of quality in lessons, we hope. ' 

Multiple Regression Analyses were performed in the several 
sets of variables. The purpose of the analysis was to see which 
variables, predict large mis classifications. .'Variables 3, 5, .6, 7,- 8, 9, 
10, 11, 12, 13, u4, *15, 16 and 17 were taken as a set of predictors, 
and Variable 1 (P(F+)y was taken as the crlf^ribn. Stepwise Tnulliple 
regressiqn where F values of entry and removal of predictors were Set k 
at 2.0 was performed cm these variables, and* then ]c 0 -mean|/n with ' 
.t-value of -10.4, a 2i with t = -6.6, % fail with t = 2.6 and c with 
t = -2.0 were selected to predict P(F+), probability of false positive. 
Multiple R was .921; R corrected for shrinkage was .828. • Q 

^ ^ A second analysis was performed on the same set of predictors 

and the criterion variable of 18, P"(F-) . "Multiple" R of .912, R 
corrected for shrinkage of .782 were obtained with the predictors 
|c 0 -mean|/n, 0121, maximum correlatibn, - number of items, c, and timeg A 
o (correlation of gain and MVE scores). 

A third analysis was done on the same set of, predictors and 
\ the criterion variable 4, P(F+) + P(F-) . The result was pretty much 

similar to the first and secortd analysis results* The predictors are 
|c 0 -meanj/n, ot 2l , maximum correlation, c, the correlation of gain and 
MVE scores, naf-ter, and number of items in a test*. Multiple R is ,97b v.-. 
and R corrected for shrinkage is .9.18. These results are summarized' 
in Tables 39, 40 and 41. . 



r 



111 



123 



Table 39 

Relatiopship between P(F+) and Other .Variables 



Variable Beta-Coefficient SDT Error t \ 0 



3 , 

6 

9 

* 17 


<x 2 i -.709 
% fail -.256 
c -.172 
|c 0 -mean|/n -1.216 


.108 
.099 • 
.087 
.117 




-6.6 
-2.6 
-2.0 
-10.4 


Multiple 


% = .921> Corrected R for shrinkage = .828, F^ 22 




30.543 




Table 40 . 










Relationship between F(F-) "and 


Other Variables 


> 




— X 

Variable 


Tte ta-Coef f icient 


SD^ Error 




t 


— ; f 

3 
8 
9 

15 
16 
17 


a 2i ' • • -.793 - 
mc M .529 - 
c -.368 
timeg ' -.238 
items -.409 
|c Q -mean| /n -1.198 


.141 
f .123 
.110 
.106 
.101 
.145 




-5.6 

H . J 

-3.4 
-2.2 
-4.1 
-8.3 


Multiple 


R =~\912, Corrected R-for shrinkage = .782, F 7 ig 




13.446 




7 Table 41 






* • 


Relationship between P(F + or F_) 


and Other Variables 




Variable 


Be ta-Coef f icient 


SDT Error 




t 


3 
5 
8 
9 
14 
16. 
17 


ci2i -.864 
nafter -.195 
mc . .377 
c -.362 
gain . 224_ 
items -.161 
|c 0 -mean|/n —1.216 


.0o8 
' .077 
.079 
.080 
>070 
.073 
.096 




-9.8 
' -2.5 
4.8 
-4.5 
3.2 
-2.2 
-12.6 



Multiple R = .970, Corrected R for shrinkage = .918, F R R = 35.382 



Variable |c 0 -mean|/n is a common predictor of three criteria variables 
and t-values are -10,4, -12.6 and -8.3 which are the largest among other 
predictors. This result is expected due to the nature of beta-binomial ^ 
model, but as the second strongest predictor in the three analyses 
is surprising. If ot^i is high enough, then the probability of the 
three errors, false positive, false negative and either misclassif ica- 
tion, will be minimized. Most Master Validation Exams have reliabili-^ 
ties$ of around .4 to .5 which is quite low, so it' is natural to expect 
that misclassif ications will have occurred quite frequently in the 
progtam. 

* The variable a.21 does not correlate significantly witji ° 
Variable 16, number of items in the tests; it correlates with Variable 
6, percentage of failure at the 5 percent significance level. This 
relationship may be interesting to investigate further, especially 
when the te^t lengths are short and about the same,' 10 to 15 items as 
is typical for criterion referenced tests. It is apparent that C121 
is a strong predictor of the three criteria with beta values of 
-.709, -.865, and -.793 respectively, and therefore internal consis- 
tency is an important factor for controlling the occurrences, 9f mis- 
classifications in a criterion-referenced test. Figure 25 is a copy 
of the PLATO screen where the graphic relationship between P(F+) + P(F-) 
and CI21 was plotted. The' curves in Figure 25 are of P(F+) + P(F-) as 
y-axis, Ct2i as x-axis for the test whose mean value is 8.907 and the 
test length is 10. When cutoffs are 7, 8, and 10, the corresponding 
curves go down as a.21 8 oes larger. The curve,, for cutoff 6 has the 
optimum value at around 0*21 = *6, but_ it goes down as' CI21 increases. 
I£ internal consistency 0*21 of the test is between .53 and 1, then 
cutoff 7 minimizes the probability of F+ or F-. If 0121 of the test 
is J-ess than .53, then the optimumcutof f will be 6. Thus, the optimum 
cutoff scores so as to minimize the misclassif ication .mistakes depends 
on 0t21» This fact will be one useful guide to construct a criterion- 
referenced test so that misclassif ications, false positive, false 
negative c,an be minimized. The most interesting result is that the 
shape parameter c appears in three cases as a predictor with beta- 
values of -.172, -.362 and 368 respectively. If the l*esson has 
larger c value, then the probabilities of misclassif ication, false 
► positive, and false negative become smaller. Even though P(F+), 
P(F-) and P(F+) + P(F-) are determined by such variables as number of 
items, means of CR test scores,, '0121 that are purely obtained only from a 
test, the value of the shape parameter c of the Weibuil distribution 
entered as a common predictor of the three misclassif ication cases. 
It implies that some factor of a lesson related to the quality of 
lessons, or conditional mastery rate of the lesson (conditional 
probability of a student who has not mastered the lesson at time t 
will, mafeter it at the next moment, t + At) affects the possibility of 
having misclassif ications upon judging biased on the scores on the end 4 
of lesson test. *. ft 

» 

113 

* - 1 

.125 



In the last analysis, Variable 2, c Q /n, was taken as the 
criterion and all other variables as predictors. Stepwis-e regression 
analysis selected predictors, t Q , 7 |c 0 -mean|/n, items and mc with 
multiple R of .875 v and R corrected f for shrinkage of .735; F value for 
'this regression was 17.98. ^ Table 42 shows the result of analysis. 



Table 42 



Relationship between c 0 and Other Variables 



Variable 




Beta-Coefficient 


Standard Error 


t 


7 


to 


.578 


.112 


. '5.2 


8 


mc 


.269 


.112 


2.4 


16 


items 


-.287 


.107 


-2.7 


•17 


| c Q -mean 


/n -.414 


.113 


-3.7 



Multiple R = .875, Corrected R for shrinkage = .735, F, ort = 17.984 

A* 4>22 



It is surprising to see that t 0 , location parameter of the. Weibull 
distribution, enters as the strongest predictor of^the optimum cutoff 
scores c 0 with t-value of 5.2. ' Beta-coefficient' 0^578 indicates 
the lessons that have larger t 0 values tend to have larger c c /n. Note 
that c p came in together with t 0 , y 0 , rarige and timeg. The percentage 
score 'of the optimum cutoff, |c 0 |/n showed a strong relationship with 
time variables in the corresponding lesson. We don't know how to 
interpret this result. : 

The major coriclusion of this section is that misclassif ica- 
.tioi^r "false positive and false negative are mainly affected by three 
factors: ho^ closely to the mean of a test the cutoff was selected, 
internal consistency of a test, and conditional "mastery 'rate* of a 
lesson. v * 



mi 



L 



9, SUMMARY AND CONCLUSIONS 



This study investigated the feasibility of using the family 1 
of Weibull distributions — a family which is widely used in system- 
reliability analysis — as a model for <the distributions of time scores 
(response times) of items in criterion-referenced tests, lesson segments 
and entire lessons that were implemented on the PLATO system. The items 
were those* of a series of matrix algebra tests developed for the dual 
purpose of using in this study and for testing students in three sta- 
tistics courses at UIUC bofe> before and after they studied our matrix 
algebra course. The latter provided the lesson segments (including 
exercises), while the entire lessons came from the Chanute AFB QBE 
Project and deals with special and general vehicle' maintenance training. 

( 

The fits of the Weibull distributions to these various ' 
observed distributions were, on the whole, very good to excellent as 
gauged by the Kolmogorov-Smirnov goodness-of-f it test. However, for 
some items (mbst of which possessed certain exceptional properties in 
common) the two-parameter gamma distribution offered better fits. The 
same held true with even greater force for the exercises occurring^in 
the. matrix algebra lessons. Tentative explanations of when and why 
the gamma was better than the Weibull were advanced, but discovery 'of 
definitive .reasons must await future research, * 

Interpretations of the -three Weibull parameters — the 
theoretical minimum time or location parameter t 0 , the scale parameter 
y 0 which is closely related to the mean, and the most interesting ^ 
although sometimes "recalcitrant 11 shape parameter c — were given in 
terms of psychometric properties of the achievement test items. The 
last mentioned parameter was found by correlational analysis to,, be 
moderately related to two kinds of item difficulty index — the tradi- 
tional proportion passing and a more subtle one developed very recently 
by Irmgard Loeschner (personal communication). It was also believed 
to be related to what might be called "degree of engagement or in- 
volvement" of the student with the task, and further to be associated 
with degree of familiarity with it,' B<£th these are* akin to, but 
conceptually different from, difficulty, . * * 

^A # function related to, and partially determined by, the shape 
parameter £ is what we dubbed the conditional response rafe (CRR) and 
which is called the hazard rate in the system-reliability literature, 1 
This is the conditional probability that an .examinee who has not 
responded to an* item (or lesson segment, etc) up to time t .will 
respond to it within the follpwing infiniteimat interval [t, t "+ At], 
When c < 1 CRR is a monotonically decreasing function of t, and' the . 
implication is that students give up early trying to *solve* such an 
item. This typically occurs in pretest items, while the same items 
given after the instructidn usually has c > 1 and the CRR is a mono- 
tonically increasing curve. However, anomalous items- (of which there 




/ . 116 

ERIC .* 128 



were three in one of our subtests) that, involve material jiotf covered 
in the lesson* will behave like pretest items even wJten given in the 
posttest. In a way, therefore, one might say that the Weibull shape 
parameter c relates- also to how well an item "matches" the instruc- 
tional content. If the match is poor (as it was in the tjirefc 
anomalous ite'ms), then the students will get -frustrated and angry, (as 1 
they did) and will quit trying early, which will be reflected in c 
becoming less than 1. If the match is good, on the other hand, the 
students will b^ and large become ego involved ,and will engage them- 
selves deeply With the\items, thus resulting in c > 1 which leads to an . 
increasing CRR, implying that the longer a student perseveres in the 
item the greater 'the cftances that he/she will answer it. 

t A relatively trivial point, but nonetheless one which bears 
passing' mention, is theVfact that the location parameter t 0 estimated 
for the group which got that item right (i.e. the "OK subgroup" as we 1 
have been calling it) gives a good idea of the minimum .time that should 
be allotted for answering that* item. 

Another finding is that the time-score distribution of an 
item which, requires only simpi^ mechanical subtasks for its execution 
is generally fitted better by a two-parameter gamma than by a Weibull 
distribution. As mentioned in Section 2.2, a two-parameter gamma 
distribution [see equation (2.9), p. 10] with integer-valued c(>l) is 
a c-fold convolution of one-parameter negative exponential distribu- 
tions. Such distributions fit well the time distribution of a simple 
task withybuf one stage; hence their c-fold convolution Should fit a 
problem consisting of c independent stages each of which is simple and 
mechanical. Thus the finding jusf cited makes good, intuitive sense. 

We 'also found some evidence to support the thesis that the 
shape parameter c is a more sensitive measure of the "conceptual 
'difficulty" of an item than is the traditional difficulty indertV This 
was done by identifying five sets of Items' that respectively had the 
Same difficulty in the traditional sense" but differed considerably 
in their c values. For example, both the following' items were correctly 
answered by 29 percent of our sample: (1) "if AB - AC, then is B =>C? 
(2) An item calling for the inverse of a '2x2 matrix. Yet c - 1.01 
for the fatfst and c =^1.24 for the secondhand certainly it can, be - 
argued tha£ the latter is conceptually more difficult than the former. 

\ * - <•■ 

A new measure which we named the "efficiency index" of a 
lesson was defined as follows. The total. sample' of students is 
divided into three groups on the basis of scores on an aptitude ,test 
relevant to the subject matter of the lessons (say A and B) whose ^ 
relative efficiencies or qualities are to be compared. The groups are, 
for instance', the t?bp 25 per cent (group 1) , the middle 50 ^percent 
(group 2) and the Bottom 25 percent (which is discarded "from further 
consideration) . We assume that there are 'other lessons in the 'same 
or similar subject matter that have also b.een studied by our sample 



117 

129 



of students, and one of them is arbitrarily chosen as a "reference 
lesson" (R) • The avera'ge times taken by group 1 and group *2 to master 
the reference lesson are divided respectively by the mean mastery times 
pf Lesson A arid Lesson B in two groups. We now have four ratios, 

, XRlAl, W 5 A2, W*B1 a * d W V sa *- 
Finally, we take the pairwise ratios of these ratios, thus: 

W*A1 , „ W*B1 



E . / _ N - . and E _ . r , 

A(R) ~ /rr B(R) 



W 5 « w 



B2 



On 'the reasonable assumption that a "good" lesspn will not require 
group 2 (average aptitude) students much more time than group 1 (high 
aptitude) students to master it, while a "poor" lesson will show a 
larger discrepancy in mastery times, the ratios Ea(R) an & Eg( R ) < defined 
above will represent the relative efficiencies of lessons A and B: 
the one with the l'arger the ratio is the more, efficient lesson. If 
there are more lessons to be compared, there will be more such - * 
efficiency indices, and the lessons will be rank ordered by them. 
(The rank ordering will be invariant of what lesson is chosen as the 
reference lesson.) 

When a factor analysis followed by varimax. rotation was 
carried out on 19 variables including our efficiency* index and the 
Weibull shape parameter c, a distinct factor was found that loaded 
only these two variables. We, thus find yet another evidence of the 
meaningf ulness of parameter c. 

The relationship between the probability P(F+) of a false J 
positive (calling a non-master a master on the basis of a criterion- 
referenced test), the probability P(F-) of a false negative (calling a 
master 3 a non-master) and the probability P(F+ or F-) of either mis- 
classification on the one hand, and the threfc Weibull parameters * 
other psychometric properties i of tests such as (X21 and fc G - *nean|/n 
(the distance between, the mean and the theoretical cutoff point* for. 
declaring "masterhood," adjusted for test length) was examined by 
stepwise multiple regression analysis. It turned out that the shape 
parameter p was one of the strongest predictors of P(F+) and of 
P(F-), along with CX21 and | c 0 - mean|/n. The direction of the re- 
lationship so far as c is concerned was that, the larger the c *the 
smaller the P(F+) and P(F-). (Actually, the same directionality of 
relationship held for &2Y an ^ l c o " mean|/n as well.) Hence we may 



118 

130 



conclude that, although one way to minimize misclassif ications is 
naturally to use the QOtimal cutoff point, that alone "is insufficient. 
We may still have quiteN^rge P(F-f), and P(F-) and P(F+ or F-) values 
for some tests unless internal consistency (0^21) and c (a surrogate 
measure of efficienc^of instruction) are' also high. 

- One incidental vbut in our mind important andv interesting 
finding was that item discrimination power appears to be an "iriverted-U" 
type function^of time allowed for completing that item. This is how 
we arrived at this conclusi<$rr. 



Carroll (1963) emphasized in his "model *of*school learning" 
the importance of differences in the time required to learn and 
asserted that learning rate was &n important source of individual 
differences in educability. A study conducted -by one of the present ^ 
authors during the past year showed that the time needed to complete 
certain tests correlates with aptitude scores more significantly th3n 
do the scores on the tests. Sato and his coworkers (1973, J.975) , and 
Tatsuoka and Tatsuoka (1978) have studied the Statistical aspects of 
time-score distributions .and theif: characteristics. Whfen a te.^t item 
is easy, there is an 'optimal time point within a relatively short time 
interval such that the discriminating power of the item becomes^ the 
largest. On the other hand, fojf difficult items, the longer the time 
allowed the better the discriminating power. Figure 10 (p. 41) Is a copy of 
the PLATO screen display of plots of the discriminating powers of an 
item in our matrix algebra test,* against 10 time points obtained in 
the following manner. The subjects were first arranged in ascending 
order of the time they took to respond to a given item.^ The fir^t 
(leftmost) point in the figure was obtained as follows. Only those 
who got the item right and were in the fastest 10 percent of/ the group 
were given a score of 1. Everyone in the remaining 90 percent of^ the 7 
group got a score of 0 even if they got the item right. The point- ^ 
biserial correlation coefficient c^Leulated between the item score ^ 
thus defined and the modified total score is the ordinate of the first 
point (10, ^02) in Figure 10. Next only those who got the item right 
and were in the fastest 20 percent were scored 1, and the others were 
scored 0 on the item, and the total score was^accordihgly modified. 
The *point-biserial correlation thus calculated is the ordinate of the 
second plotted point (20, .14). s The same process was repeated for 
the Remaining cutoff percentages, 30 percent, 40 percent, 90 
percent', yielding adjusted discriminating powers, .27, .46,"..., .15 
respectively. ^ * 

The limitations of this study are many in number, perhaps 
the chief of which is the fact that it is «not experimental in the , *^ 
sense of having a neat design and experimenter-manipulajted independent 
variables. It is, rather, a status study from which, of course, causal 
relations cannot be definitively concluded but only inferred and hinted * 
at. -On the other hand it has the strength of having been conducted 



119 



131 



in a real CAI classroom situation yielding "dirty" data instead of 
"antiseptic" data that often accrue from tightly controlled, laboratory 
experiments which are;hence frequently criticized as bearing llttlfrW'^v 
' relationship with real life, (To be sure, some' of the dirty iiata * 
were "laundered" to the extent that they meet the minimal demands of 
analyzability — not to fit our preconceived theory of course — but flirty 
and "real-lifeish" they nevertheless remained,) 

Other weaknesses, as mentioned in the main text, were (1) U 
.that the parameter-estimation procedures, were ' not the best conceivable * 
or even available— -we learned too late . of the best existing method; 
via, an iterative maximum-likelihood approach; and (2) thar we* did 
not consider tpo-component composite Weibull distributions which ' 
probably woulfl have fi& the total sample without our having to parti- 
tion it into the "OK" and' "NO" subgroups— those wh 9 answered an it fern 
' 4 (or exercise, etc) right or wrong, respectively, - 

As, of this writing we have in fact implemented on the PLATO 
system a program for the iterative maximum-likelihood method (adapted 
from the FORTRAN printout kindly supplied to us gratis by Dr, H, Leon 
Harter.of tht* Wright-Patterson AFB, Ohio) which, mutatis mutandis , 
*is usable for estimating the parameters of both the three-parameter 
ganima and the Weibull distributions in the best possible way given 
the state of the art. We intend to do this as well as experiment 
with composite Weibull distributions in the near future. 

Thus, we would-be the first to concede that we have barely 
scraped the surface in studying the utility of response time (time 
scores) along with performance scores for analyzing and evaluating 
data from criterion-referenced tests*, both for the purpose of assessing 
the quality of the tests themselves* and for improved testing of the 
examinees 1 abilities, 

+ 

Nevertheless, we believe that we have at least demonstrated 
thfe feasibility of this approach and hope to have shown that further 
research along these lines is warranted. In particular, the Weibull 
distribution in its two-parameter form (which we used in this study), 
three-parameter form, or two-component composite f^rm—long used by 
system-reliability analysts but apparently not wjjlely known among : 
educational and psychological researchers-- seems to bear further 
investigation for this purpose. 



120 

132 



f 

v „ 

' \ REFERENCES 

, Atkinson, R.C. t Computer-bas^d instruction in initial reading. 

\n proceedings of the 1967 invitational conference on testing 
» problems . Princeton, Educational Testing Service, 1968, 58-67. 

Bargman, R.E., Association in a c.lass of growth functions. Urbana, 

University of Illinois , 1966. , * 

t 

Block, J.H., (Ed?) Mastery learning : theory and ^practice . New York;.., 
Holt>,Reinhart & Winston, 1971 . • ;< 

Br^e, D. S% The distribution of problem-solving tinr£$: an examination 

of the stage model. British Journal of Mathematical and Statistical * *> 
* ; Psychology , 1975,28, 177-200. * ^ 

Carroll> J.B. , A^model of school learning. Teachers College Records , 
1963,64,723-733. • " • 

Cfarroll, J. B., & Spearitt, D. , A study of a model of school learning s 

Monograph No.4 Cambridge, Massachuset tsj > -Harvard University, 
^ Center for Research and Development of Educational Differ encqs, 
1967. , , - 

Emtick, J. A. An Evaluation model for mastery testing. » Journal of * 
* Educational Measurement , 1971,. j5, 321-4826. 

Guttmarf, L., The quqnt;if icat ion of a class of at tributes : a theory 
fitnd method of scale construction. In P. Horst(Ed.), 
4i £ . # Thjg"prediction of personnal adjustment * Social Science Research 
; t Council, Bulletin 48, 1941, 321-345. 

Harter, H.L. & Moore, A.H., Maximum likelihood estimation of the 

parameters of the Gamma and Weibull populations from \>mplete and 
from censored samples, Technometrics , 1965, 7_ 9 639-64X 

Harris, C.W., An interpretation of Livingston's reliability coefficient 
for criterion-referenced tests* Journal of educational measurement . 
1972,2, 27-29. 

Huynh, H. Statistical consideration of mastery scores. Psychometrifra , 
1976,41^, 43-64^ 

John, M. V. ,Jr. & Lieberman , G.J., An* exact asymptotically ^Efficient confidence 
bound for reliability in the case of the ^Weibull distribution, 
technometrics, 1966, 8, 135-175. ' % " , 



ERLC 



121 



c 



133 



Keats, J.A. & Lord, F.M. , A theoretical distribution for mental test 
scores. Psychometrika , 1962, 27, 59-72 

Lennon, G.H., Maximum- likely hood estimation for the three parameter 
1 Weibull distribution based on censored-sample, Technometrics , 
(to appear) 

£inn, R.L. Personal communication," October 1, 197*8. 

Livingston, S.A., Criterion-Referenced applications of classical ^ 
test theory. Journal of educational ^measurement , 1972, 13-25. . 

4 ~ 

Loeschner, I., Personal communication, February 11, 1978. 

Lord, F.M. & Novick, M.R., Statistical theories of mental test 
scores * Reading: Adison-Wesley, 1968. 

Mann, N.R., Tables for obtaining the best linear invariant estimates 
of parameters of Weibull distribution, Technometrics , 1967 ,9, 
629-645. 

Mann,N.R. , Optimum estimators for linear function of location and 
scale parameters, Annals of mathematical statistics , 1969,40, 
2149-2155. 

\ * 

Mann, N. R.,Schafer, R. E. & Singpurwalla, N. D.* Methods for Statis 
analysis of reliability and life data * John Wiley & Sons, 
New York, 1974. 



tical^ 



Mlllman, J. Tables for determining number of items needed on domain- 

-referenced tests and number of students to be tested. Los Angeles: 
Instructional Objectives Exchange, Technical Paper No. 5, April 19^2. 

Millman, J. Passing scores and test lengths. for domain-referenced measures 
Review of Educational Research , 1973,43, 205-215. { 

Novick, M.R. , The axioms and principal results of classical test theory. 
Jpurnal of mathematical psychology . 1966,2* 1-18. s * . / 
1961,4, 321-324. 

•^J Novick, M. R. & Lewis, C. Prescribing test length for criterion- 
referenced measurement. In C.W. Harris, M.C Alkin, & 
W.J. Pophani (Eds.), Problems in criterion-referenced measurement . 
Los Angels: UCLA Graduate school of Education, * Center for 
the Study of Evaluation, 1974. 



122 



134 



I 

I:. 



V 



I 



fHeoE 



V A 
3 
0 
t- 



. * Appendix A 
Sample Pages of Matrix 
Algebra Lessons 



The numerical entries are called the 
' el ements of the matrix, 
fi particular element is specified by 
the number of the ROW and the number 
of the COLUMN in which it occurs. 

is a 4*3 .matrix • 
3 columns 



A ' -7 
8 5 
6 * 9 



What -is 1st row vector? 
4-ls^t 4.2nd 4. 3rd-el ements . 



Press NEXT to continue, SHIFT-HELP "for mde-< 
O ' BACK to see prev ictfs page 

; ERIC - < ^ 



I 

I:. 



V 



I 



fHeoE 



V A 
3 
0 
t- 



. * Appendix A 
Sample Pages of Matrix 
Algebra Lessons 



The numerical entries are called the 
' el ements of the matrix, 
fi particular element is specified by 
the number of the ROW and the number 
of the COLUMN in which it occurs. 

is a 4*3 .matrix • 
3 columns 



A ' -7 
8 5 
6 * 9 



What -is 1st row vector? 
4-ls^t 4.2nd 4. 3rd-el ements . 



Press NEXT to continue, SHIFT-HELP "for mde-< 
O ' BACK to see prev ictfs page 

; ERIC - < ^ 



3,3 Evaluation of the determinant of a matrix' 

SARRUS' RiJLE 

Next / let us show you the way 
to evaluate a 3rd order determinant, 
by Sarrus'* rule. . . 

. Copy the first two columns over* again, . . 
and connect" each of the three f irst -row m elements 
of ft with the two numbers J ocated "southeast" 
of it by solid 1 ir^s. 

Similarily, connect* each of tbe~three third 
(row elements by THICK solidMines with the 
. *wo numbers located "northeast" of it. _ 
. Note that solid lines proctuce epsflpn 
value ♦l^'and the thick 1 ines ^producfe rt. 
TVjus*, the value of determinant ft is 




a 22 a 33 * 



L 12 a 23 a 31 * 



a 13 a 2i a 32 



- a 



ll a 23 a 32*~ a 12 a 21 a 33 # 




Press -NEXTV 



Ide just obtained the relations 



Oft = (cos30°) *5 + (sin3j0°) *5- 
OB = (-sin30°, *5 + (cos30°) *5 



Substituting cos3tf° , siri30° by their, 
values* 

OR = . 866*5 + . 5*5* = 6. 83 
OB = -.5*5 + . 866*5 = 1.83. 



Yz 



T hus, OP is represented by [6.83, 1.83] 

'using the new exes V, and' 'iV. 
8Xy 1 

7 -- 



6 +P 2 

5 -- 

4 - 



/V ^ [6.8;, 1.833 



1 hh — ij 1 h 



0 1 2 



3 4 7 8 "> 

Press ME>, T 



% 



APPENDIX B 
PRETEST FOR MATRIX ALGEBRA 

Those who have/little or no background in matrix algebra may be unable 
to answer many/ of the items below. You may skip by pressing the NEXT 
key without answering. ^ 

You may then come back to the test af.ter taking one or more lfcssons. , 

This test will provide you with some feedback so that you may choose 
^only the lessons you have to learn from five lessons in , the index. 

to start press -NEXT- 

1) Choose the right answer. 





3 


7 




-i 


8 
















-5 


7 


+ 


3 


9 


= i 










a) 


4 


-1 




b) 


2 


15 


c) 


-4 


15 


d) 


10 7 ' 




-8 


-2 






-2 


16 




-8 


16 




-2 12 



2)' Choose the ri'ght answer. 



6 



1 
-1 



0 
-6 



= 9 



a) 


-2 2 


6) ■ 


0 


1 




c) 


0 5 


d) 


0, 2 




7 l'5 




5 


3 

J 






2" 3' 




3 3 



3)_ Choose the right answer. 



8 -1 
0 -7 



3 
-1 



= 9 





















a) . 


9 -4 


b) 


7 2 


c) 


-4 


15 


d) 


10 7 




5 -6 




-5 -8 




-8 


16 




-2 12 



127 

139 



4) Choose the right answer. 



-1 ,. -1 
2 6 



6 10 

9 ' -1 
. . t J 



s f 



a) 



-7 
-7 



-11 
7 



b) 



5 
11 



9 
.5 



c) 



-7 -7 

-11 7 
— i 



d) 



-1 
7 



5) Choose the right answer. 



8 -i 

0- -7 



x 10 = ? 



a) 



80 
0 



b) 



80 -1 
0 -70 



6) Choose the right Answer. 



c) 



80 
0 



-10 
-70 



d) 



80 
0 



7 

-8 



2 
-2 



x 5 = ? 



a) 



35 2 
-8 -10 



b) 



35 10 
-40 -10 



7> Choose the right answer. 
_ > 
2 4 



9- -1 



a) 



-1 

9 



b) 



1 
9 



c) 



35 -40 
10 -10 



1 

0 



d) 



35 , 2 
-.10 -10 



d) 



9 
-1 



128 

140 



8) Choose the right answer, 



-5 \2 
6' 4 6 



2 '5 
-4 -4 



9) Choose the right answer. 

1 



X ? 



r 








7 


-5 








-9 x 


8 


2 









a) 


-3 


3 


b) 


2 ; 2 


c) 


-3 


2 


d)- 


0 


0 




2 


2 




3 -1 




3 


2 




] 


1 



a) 


-63 45 


b) 


63 45 


c) . 


-1.3 1.8 


d) 


-63 


-72 




-72 -18 




-72 18 




-1.1 -4.5 




45 


-18 





















10) Choose Che right answer. 
/ 

-1 0 
0 -1 



V 



a) 



l/(-l) 0 
0 l/(-l) 



b) 



1 
0 



0 

1 



o) 



-1 

0 



0 

-1 



d) 



a) 2x2 matrix 

b) '3x2 matrix 
*c) 3x3 matrix 

d) 2x3 matrix 

e) not computable 





llj \fliat 


is the order 


of the product of 






*- 




-4 


5 




8 


■ -3 






» 






7 


-1 


X 


-8 


-1 


= ? 


i 














4 ' 


18 









ERIC 



141 



I 

12) How many two-factor products involving A, B and their transpose are 
^ computable? (e.g., AB\ BA' and B 2 ) 



A = 



a) none 



5 
-3 



4 
4 



B = 



5 


10 . 


-1 


1 


19 


5 



b) 1 



c) 2 



d) 6 



e) more than 6 



13) Suppose a matrix A is 2 x 2 symetric matrix, choose the letter whose 
^statement is not true. • 

a) A = A' 

b) A is a square matrix* \ 

c) AB = BA for any 2x2 matrix B 

d) If the inverse of A, A" 1 exists then A" 1 = (A')" 1 

14) C = AB where A is p x q, 6 is s x . Which of the following state- 
ments is not necessarily true? > ■, 



a) The order of C is p x t 
15) Choose -the correct answer'. 



b) p = s 



c) s = q 



a) 



-3 
0 



3 
0 



9 • 15 
3 3 



3 
-2 

b) 



9 
3 



15 -6 
- 4 5 



c) 



9 
6 



9 
6 



3 
-9 



d) 



12 
0 



12 
4 



4 
-6 



ERIC 



16) Choose the Kj,ght * answer . 

1 



-1 

2 



0 
-3 



6 
5 




-2 0 
4 =1 
0 1 



= ? 



a) 



-2 
12 



c) 



6 
-3 



2 6 
-16 8 



b) 



d) 



-2 *6 

4 ' -1 

4 6 

-16 5 



17) Two 3-dimensicmal vectors u 1 ■ (ui, U£9 U3), v ' = ( v p v 2> ^3) > 
and a 3 x 3 matrix A are given. Choose the wrong statement*. 



a) 

b>- 

c) 

d) 

e) 



u'v is a number^ 
uv' is a number 

v is a number 
vu ! is a mati^ix 
v'A'u is a num&€r 



18) If A - 



a 
c 



b 
d 



jits determinant |a| 



is equal to . 



a) 
b) 



- ab + be 
a + d 



c) ad - be 
d) 
e) 



ac - bd 
a + b + c + d 



19) The cof actors of the* elements of the first row of 

* v ^ - - 



a b 
c d 



are 



a) d, -c b) d, c c) d, -b d) b, c e) , b, -c 



20) The cofactors of the elements of the 1st cblumn of 



a 
d 

8 



b 

e 
h 



c 
f 
i 



are 

v 



a) 


e 


f 




b 


c 




b 


c 






h 


i 


> 


h 


i 


9 


e 


f 


« 


b)- 


e 


*f 




b 


c 




b 


c 






ih 


i 


> 


h 


• i 


9 


e 


f ' 




c) 


d- 


e 




d 


f 




d 


e 






8 


h 


9 


g 


i 


9 


g 


h 


* 




a 


b 




a 


f 




a 


b 


> 




d 


e 


9 


g 


i 


9 


d 


e 





e) a, -b, c 



131 

143 



' / 



21) "For a given matrix A = a b c 

d e f 

g h i 

choose the correct statement. 



a) 


d 


e 


f 


-e 


d 


f 


+f 


d 


e 






h 


i 




g 


i 




g 




b) 


a 


e 


c 

t 


-b 


d 


f 


+c 


d 


e 






h 


X 




g 


i 




g 


h 


- c) 


a 


e 


f 


-b 


d 


■ f 


+c 


d 


e 




* 


h 


i 




g 


i 




g 


h 


d) 


a 


e 


* f 


-b 


d 


f 


+c 


d 


e 


v/ 

22) 


If 


, h 
A + 


i 

E 

B = 


A + ( 


g i 
: then B 


= C 


g 


h 


a> 


true 


b) 


false 


\ 








23) 


If 


AB = 


v AC 


then 


B = 


0 









= 0 



a b c 

d e f 

g h .i 

X 

a b " c 

d e ' f 

a h . i 

a + e +'i 



a) true b) false 

24) If AB = 0 then necessarily A = 0 or B 
a) true b) false 

25) AB = BA for any matrices A and B 
a) true b) false 

26) A ( B + C ) « AB + AC 

a) true b) false^~ 

» » 

27) ( A + B ) ' = B' + A' 
a) true b) false 

28) ( AB )' - A' B ' 

a), true b) false 



= 0 



9 

ERIC 



14f 



29) If A = A f then AA f = I 
a) true * b) ' false 



-1 



• 30) If P is invertible and B = P A P, then the determinants of B and 
A are equal, 0 

a) true b) false 

31)' Let A, B, C, D be n x n matrices, then the determinant of 2n x 2n 



matrix 



A 

C 



B 
D 



is the determinant of matrix AD - BC. 



a) true b) false c) I don't know 



/ 



Choose the right answer. The adjoint matrix of A = 



a) 



c) 



-9.00 0.50 

L 0.33 2.10_ 

2.10 0.33 

-0.50 -9.00 



,b) 



d) 



2.10-0.33 
0.50 -9.00 

2.10 -0.50 
0.33 -9.00 



-9 -.33. 
.5 2J 



33Y Choose the inverse of the triangular matrix. 



a) 



3 
-2 
1 



0 0 
5 .0 
-6 -2 



b) 



•c) 



1/3 -2 1 
0 1/5 -2 
0 0 1 



1/3 0 0 
-2 1/5 0 
' 1 -6 -1/2 



d) 1/3 -0 0 
4/30 1/5 0 
-7/30 -3/5 -1/2 



3 

-2" 
1 



0 ,0 
5 0 
-6 -2 



34) Which one of the following has orthogonal row vectors? 



a) 


1 0 


b) 


1 1 


c) 


1 -1 


d) 


0 1 




1 1 




-1 1 




1 -1 




-1 1 



is 



ERIC 



J 



133 

145 



35) Which of the following transformation matrices is not orthogonal? 



a) 


1//T 


2//~5 


b) 


0 -1 




*c) 


l//~2 


-1//.2 




2//T 


-l//~5 • 




i • 0 






1//T 


1//T 


d) 


-1 0 


e) 


3// 10 


1// 10 










0 1 




1//T0 


3/ /To 









36) The product of two orthogonal transformations is an orthogonal 
transformation matrix. 

, a) true b) .false 

37) ',The row vectors contained in an orthogonal transformation matrix* 

are mutually orthogonal but are not ne.cessary of unit length. 

a) true b) false 

38) The column vectors contained in an orthogonal transformation 
matrix are not mutually orthogonal when the row vectors are 
mutually orthogonal, ^ / 

a) true b) false m t — 

39) Any rigid rotation is an or^X-hogonal transformation matrix. 

>a) true b) false v 

40) An orthogonal transformation of axes will not change the length .. 



y 



of vectors in the space. 



a) true * b) false 
41) Suppose matrix Z = 



5.3 .5 
.5* 10.1 



is a variance-covar lance matrix. 



Choose the wrong^* statement. ^ ? 

a) The .characteristic equation of Z is | E - Aj | = 0 

-b) Th£ characteristic equation of Z is X - 15. 4A + 

c) Z is always transformed by some matrix intt> the /Hfagonal form,. 

d) The roots of the characteristic equation might be complex variables. 



134 

146 



I 



,For items 42-46, assume the following: 

Suppose T is a n x n vnriance-covariance matrix, A, > A n * > A 

J — 2 — n 

are its characteristic roots (or eigenvalues), v.., v 0 , ... v are the 
characteristic vectors (or eigen vectors) of Z associated with 
A^, . . A^ respectively. 



\ 



If the rank of Z is n, then the eigenvectors v^, are 
linearly independent. N 

true b) false 

If A , A 0 ... A are of distinct values then A, is the largest 
1 2 n 1 * 

varfance of any linear combination of x. with fixed norm of the 

i * 

^ coefficient vector, 
true b) false 

Some of the eigenvalues may be negative, 
true b) false 

v i* v j are n ° t m . ut ^ ua Hy orthogonal, 
true b) false 
1?hoose the correct answer. 

The constant term of the characteristic equation of Z is the 
trace* of I . 

The constant term of the characteristic equation of T. is the 
determinant of £. 

The constant "term of the characteristic equation of E is the 
determinant of adjoint matrix £. 
None of the above is correct. v 

4 ■ -i . . 

How arfe the eigenvalues and eigenvectors of Z related to those 
of E? 

*The eigenvalues of E ^ are the <same as those of £, but the 
eigenvectors are inverted. 

The eigenvalues of Z ^ are reciprocals of \those of £, and 
the eigenvectors are inverted. \^ 

The eigenvalues of E ^ are .reciprocals of those of E, but 
^the eigenvectors are unchanged. 

-1 • * 

' Both the. eigenvalues and eigenvectors of Z aro respectively 
the same as those of E. 

135 

~~ 147 - . 



48) If a 2 x 2 matrix A has eigenvalues , X^ y then the eigenvalues 
of kA (where k is a scalar), are 

2 2 

a) k X 1$ k * 2 

b) k k * 2 

c) X r X 2 

d) kX v k\ 2 

e) kX^ X 2 

» 



/ 



You have completed the tesl 
Press -BACK- if you wish to review your work and make changes. 
Press -NEXT- to review the test and to see the correct answers. 



136 

148 



Appendix C 

Description bf .Contents in the Lessons of Chariute 



lesson average time 


UOULcIlL 


103 


33.27 


Principles of Gas Engine 


104a 


* 34.28 


* 

Identification of Parts and Purpose 


10 4b 


missed 


Gasoline Engine Compressor 


105 


44.74* 


Cooling System 


201a>» 


12.55 


V 

Air and Exhaust System 


201bJ 


42.31 




202a 


189.63 


Fundamentals of Electricity 


202b 


14.24 


Batteries 


203a^ 






203b 


missed 


f Electrical Schematics 


20 3<r 




* 


204 


100.^0 


Starters 


205a 


1 36 . 5 1 


Cranking Motors, DCXharging System 


205b 


>v 4 1 . 20 


AC Charging System 


206a> 


50.22 




206b 


f 21.43 


Battery Ignition ' 


206c - 


43.69 


• 


207 


37.77 


Emission Control 


*3tfl 


32.40 


Diesel Engines 


303 


14.04 


Lighting System 


304 


12.81 


Warning jSystem 




L L , JO 


Clutches 


307 


72.67 


Basic Hydraulics 


308 


46.60 


Fluid Couplings /Torque. Converters 


401 


20.84 


V-Joints/Propeller Shafts 


402 


91.09 


Differentials * 


403 


13.35 


Transfer Case/PTO 


404 


12.60 


Suspension System 


405a 


31.17 . 


'Hydraulic and Mechanica^Brakes 


405b 


52.96 


Air Brakes 1 


405c 


13.64' 


Power Assisted Brakes 



9 

ERIC 



137 

149. 



V 



Appendix D - ^ 

Description of PLATO Programs and their Programmers , , 
Lesson Name * - Programmer and Description 

matxA Jim Kraatz % t \ * 

Test items were developed by one -of the authors buf the 
test frame and data collection scheme was developed by Jim 
Kraatz of CERL. Up to 50 items can be handled and item 
scores, response time for each item, aftd selected option, 
of multiple choice are collected. The traditional item 
analysis, sujch as means, discriminating powers of each 
it en are given. 

edit test Robert Baillie, Jim Kraatz 

Routine for editing data fromthe "matxV test dtiver. 

storetest Robert Baillik 

Transformation routines. This program prepares the data 
from "maixA" test driver for various analysis such as 
"datara," n wb2, M and "Kolmo.'V 

gram - Robert Baillie 

Orthogonalize up to 10 vectors by Gram-Schmidt method and 
estimates an indiyidual student's gain scores. Eight 
vectors (variables) besides the pre-test and post-test can 
b£ used to step up the .accuracy of the, gain scores. 

subr Jerry Dyer and Robert Baillie , . 

Calculates various probability functions. They can be^ 
used as a statistical table by condensing this lessdn, but, 
i they are mainly used as subroutines in user's programs. 

This program contains F-and F" 1 distributions, X 2 and 
inverse X 2 , normal and inverse normal disr ibutions , t 4 
distribution, binomial, beta, incomplete beta, and two 
parameter gamma disributions . w 
» /* * 

nuatsnbr Jerry- Dyer, Robert Baillie, and Kay Tatsufoka 

Calculates the inverse, eigenvalues andyeigenvectors , and 
determinant of a 20 x 20 ma*trix. 

itr [ / 

Kolmonorra Robert Baillie and Jerry Dyer 

J^ojjnogorov-Smirnov test of a sample 'and a given 
(theoretical distribution function, such as Weibull, Gamma, 
normal distributions. Uses ,f statedit M to input data. 

cutoff Tamar Weaver i ^ 

Evaluates the optimum cutoff scores of a criterion- 
referenced test and calculates the estimatijifn of false 
positive, negative, failure rate^ success rate based on a 
user's specified true mastery level and observed cutoff 
score. * Classify an individual's score into ofie of four' 
status groups : 

purdTpass, fail, false positive, or false negative. 



multrdg:,- , 
, multrdga, 
Mmfltreg2 , 

••nultreg^ 
jvwd linkage ^ 

program to 
t statedit , 
•f ormatf 



Kuni Tatsuoka, Robert Baillie, Tamar Weaver 
Input raw data and matrix into temporary storage, 
calculate a correlation matrix up to 20x20, partial 
correlations and stepwise multiple regression. 1 The 
data stored in a dataset via Felty's "statedit"' is 
acceptable. 



lintest 



Kay Tatsuoka 

Tests linearity of the data. 



manova 



Kay <Tatsuoka, Robert Baillie 
Multivariate analysis of variance 



sscp 



Robert Baillie 

Discriminant analysis < for one factor, several groups and 
variables, using a dataset with "statedit" data format. 



sscp 2 



Robert Baiilie 

Discriminant analysis with temporary storage. 



- fact disc" 



Kay Tatsuoka 

Factorial discriminant analysis. 



ccor 



Robert Baillie 

Canonical correlation analysis — takes data stored in 
"statedit" format, need*'a dataset. , , 



ccor2 



Robert Baillie 
"Canonical correlation analysis usin& temporary 
storage. 



varir.iax 



area package 



f ormatk 



Kay Tatsuoka, Robert Baillie 

Do principal component analysis and rotate a factor 
matrix by Varimax rotation. 

Tamar Weaver, Al Avner, Kumi Tatsuoka 

Collect the area, da£a specified by an author in his/ 

her lesson. 

Tamar Weaver' * 
Transforms area data in a "statedit" fo rmat -dataset . 



kstl 



. Kay Tatsuoka 
Augments data from several different datasets which 
are stored in the "statedit" data format. 



cfcitesfe , 



Kay Tatsuoka 

Do a simpl'e factorial analysis£]fr£~variance and 
goodness* of fit test. 



9 

ERIC 



mdcl 



Mark Bradley * * * 

Ecp.t and simple analysis of thd quizzes and tests in 
tH'e mat fix algebra lessons,- These .data were not 
it&efl in the report. * v 



15* 



Appendix E 

Tables o£ p-values and the Weibull Parameters * 

Table El 

KolmoROTov-Smimov Tests for Matrix Algebra Pretest Items for OK subgroup 



i 4* nm r\ *r M 
l icni yj X. 1^1 

11 or cr a i a i ^ i Qi off 


l von p <g 

751 fl ATA 117 1 ^7 1 Q 67 


71 flf ft 7 11 1 RflTft? Q6 

JD . JD £ i 1 1 i *« 


761 0 0001 7 7670 81 


31 0 6417 ff 7414 78 


27) 0 1097 1 2048 49 1 


4) 0.5115 0.8203 79 


28) 0.50^1 0.8231 36 


5) 0.4456 0.8631 94 


29) 0.9999 0.32*34 26 


6) 0. 1558 1 . 1296 96 


30) 0.6781 0.7198 24 


7) 0.2213 1.0488 6.1 


31) 0.3388 0.9409 31 


8) * 0.6089 0.7609 54 


32) 0.7088 0.7015 33 


9) 0.3063 0.9677 70 


33) 1.0000 0.2735 29 


10} 0.9228 0.5500 65 


34) 0.6092 0.7607 27 




35) 0.1706 1 1093 57 




^61 If 3154 0 Q600 37 

*J %J J JD i w 1 ^ i JD . 7 V JD JD w ' 


1 'Si (V QI 11 a CCOfl 1^ 
UJ 0.710/ J0.77OO 


^71 0 Q646 0 4Q8Q Q 


14) 0.7459 0.6790 42" 


38) 0.6187 0.7551 23 


15) 0.1285 1.1716 49^ 


39) 0.7291 0.6892 16 


16) 0.8567 0.6057 60 


40) 0.1785 1.0989\ 28 


17) 0.8832 0.5852 21 


41) 0.9731 0.4844 21 


18) 0.9587 0.5078 61 


42) 0.5664 0.7864 20 


19) 0.8643 0.6000 26 


43) 0.7422 0.6812 18 


20) 0.5751 0.7811 . 46 


44) 0.9407 0.5309 16 


21) 0.9650 0.4984 42 


45) 0.8249 0.6282 12 


22) 0.0065 1.6928 77 


46) 1.0000 0.3200 16 


23) 0.7124 0.6993 16 


47) 0.2973 0.9754 40 


24) . 0.9942 0.4218 27 


48) 0.8687 0.5967 32 





Pretest for al 1 subjects after 1976 Fall semester; 'goodness 
of fit" testing' for Weibull distributions' 

9 



ERIC 



152 



T&ble E2 



* The Three Wei bull Parameters for Matrix Rlgebra Test Items 



v. 



items " 




m. c. 








1 . 


10. 59^ 


• 0.96 


r. 15 




'\44. 51 


-2* 


2.55*" 


0. 96 


1 . 92 




28.12 


3. 


6.02' 


0.99 


1 . 24 




23. 34 


4-. 


0.00 


0. 98 


2.19 




32. 35 


5. 


,7. 54 


0.' 99 


1 . 25 




13. 70 


6 . 


5.51 


• 0.93 


1 . 35 




12. 67 


" 7 . 


7. 52 


0.99 * 


1.15 




.33. 92 


* 

8. 


0.00 


0. 98 


1 . 56 




57. 62 


9 . 


4.00 


0.99 . 


1.71 




40. 68 


10. 


7.2 7 


K00 


1 . 45 




2 2.00 


1 1 . 


4. 49 


0.98' , 


A 33 




41-. 95 


12. 


8 .2 4 


0.99 


V - V 7 




57.67 


13. . 


5."41 


0. 99 


18 




5,7 . 2 1 


14. 


9.2-9 


0. 99 


1 . 34 




43. 46 


15. 


0.00 


0. 97 


1 .08 




11 2. 06 


16. 


0. 45 


*0. 99 


1.26 




57. 26 


17. 


15.24 


0.99 


1.14 




108. 62 


13. ~ 


2. 77 


1 .00 


r . 6 3 




2 1.67 


19. ■ 


0.00 


0. 99 


1 . 24 




33. 70 


20T. 


1 . 37 


0. 99 


1 .00 




49.98 


21 . 


5.24 


1 .00 


1 .04 




49. 83 


22. . 


0. 61 


0.94 


1 . 90 




23. 27 


23. 


3.19 


' 0. 95 


1 . 02 




26.15 


24. 


4. 60 


1 .m 


1.14 




13.91 


25. 


3. 47 


0. 98 


1 . 42 




1 1 .04 


26.. 


1 . 96 


0. 93 


0. 87 




« 13.70 



11*3 



ERLC 



153 



^ Table E2 (con' t) 



The Three Wei bull Parameters for Matrix ftlgebra Test Itenps 



1 *L *£fflS 


U J0f 


Hi. C • 








k. y ■ 




U ■ 1 o 


.-J ■ 


P. 4 




•J. o ■ 


? Q 4 

v.' . . T 




|Tf 


» J c. 


Q p Q 


L. I? ■ 


A R ft 




y ■ 


ri 7 


OCT »;'7 




s B 72 




M 


ft s 


O O h Q 
dfj . 0 7 


3 1 


1 ^ ■"■ 




1 

I . 


■V Q 
J 7 


o n o 
sJ i. . ? -7 


-1 "I . 

•J 4. • 

o -"■ 

sJ J ■ 


r » : 

l_ H - U 

»J „ 1 M .U 


gy '"i q 

V 

1 l.Yfif 
i ■ J JJI3 


fj 
1 

1 9 


U 1 


V 

1 T' o f 7 
1 j U - L< 


o 4 


n: cr 1 
p T 1 


■ :■' u 


iTf 

• » ' . 






35 . 


] . 96 




1 . 


7 1 


5 2. 95 


36.* 


3. 89 


0 . 9 8 


0. 


91 


13.15 


37. 


D . 65 


9.97 


0. 


93 


1 3 . ft 6 


>:» 8 ■ 


f.. 74 


1 . 00 




46 


17.04 


3*} 


3.12. 


0 . 9 13 




49 


9. 73 


4JB. 


3. 77 


0.93 




10 


9 . 7 6 


41 . 


4.21 


0.99 




79 


38. ?5 


42. 


3 . 46 


0. 93 




26 


31.78 


43. 


7.23 


0 . 9 9 




42 


21.32 


44. 


• 3. 84 


* 3.90 




35 


15. "'3 


45. 


3. 34 


0. 99 




71 


17.: -J 


'46. 


1 . 7J3 


5f . 9 9 




31 


00, 6 4r 


47. 


1.21 


0.99 




22 


1 6 . 6 1 


48. 


5.17 


0.99 




19 , 


1-5. o3 



*Pretest given after /6 Fall semester, 0!' »' l^rouo 



1^1 

154 



Kolmofrorov-Smirnov Te: 



Table E3 

for Matri^ Algebra Pretest Items for NO subgroup 



item p 2 N 
1) ' 0.9912 0.4364 10 


item p z N 
25) 0.2976 0-.9751 38 


2) 1**000 0.2797 4 


26). 0.9776 0.4754 19 


3) '/ 0?4068 0.8899 22 


27) 0. 1566 1. 1285 51 


4) A / 0.6647 0.7277 21 


28) 0.0408 1.39,50 64 


T — 1 ' 

5) / 0.9882 0.4474. 5 


29) 0.0217 1.5T035 74 


f < 

6) / 0. 4203 0. 8804 . 4 


30) * 0.0205 1.5130 76 , 


• 7) / 0. 7219 0. 6936 38 


31) 0.9192 0.5535 69 


8) / 0.9458 0.5249 46 


32) *■ 0.7305 0.6884 67 


' 9) 0.9020 0.5694 30 


33) 0.6574 0.7321 71 ^ 


10) 0.9608 0.5048 34 


34) 0.5,696 0.7845 73 


11) 0. 1987' 1.0743 66 


35) 0.5377 0.8040 43 


12) 0.8269 0.6268 70 


t 36) , 0.0387 1.4043 63 


or oocfl or ct^i ci 


^71 OT 7 Qflf7 AT A*% OTQ QOT 


14) 0.9603 0.5056 58 


38) 0.2083 1.0632 76 « 


15) 0.6791 0.7192 51 


39) ,0.3499 0.9321 83 


16) 0.9418 0.5296 40 


40) 0.4919 0.8328 . 71 


17) 0.8424 0.6160 79 


41) 0.4429 0.8649 78 


18) > 0.8033 .0.6426 39 


42) 0.7938 0.6488 79 


19) -0.7451 0.6795 74 


43) • 0.0725 1.2880 81 


20) 0.2778 0.9928 54 


44) 0.0098 1.6311 83 • 


21) 0.0108 1.6162 58 


45) 0.0000 2.4586 87 


22) 0.9068 0.5652 23 


46), 0.3486 0.9332 83 


v23) 0.0097 1.6327 84 


47) 0.") 1.8 19 1.0947 58 . 


2» 0.0291 1.4545 73 

3 — 


48) 0.2734 0.9968 Jtf> 



Pretest for all! subjects after 1976 Fall semester; 'goodness 
of fit' testing for Wei bull distributions 



155 



1 



1 " 



Table & 




The Three Wei bull Parameters for Matrix Rlgebra Test I ferns 



9 

ERIC 



items 




J m . c . 


C 




1 . 




0 . 9 9 


2.08 


5 2 . 6 4- 




1 1 . 49 


1 . .00 


0. 49 


3 2,09 


J • 


6. 50 


¥ 0.9 6 


1 .05" 


19.29 


4. 


7 . 6 5 


0 . '.' 0 


1 .02 


14. <;7 




5.64 


1 . 00 


2. 10 


i 7 . * 5 


6 . 


0 . 00 " 


0 . ? j 


' 0 . ft 5 


i . 7 9 


• 


6 . 2 4 


0 . 9 9 


i . 44 


37. -13 


•J . 


@ . 00 


0 . 9 9 


1.33 


43. *:5 


9 . 


0.28 


0.99 


1.14 • 


2 7.68 , 


10. 


2.54 


1 . 00 


t . 1 1 


20. 9 6 


11. 


• '^.95 


0 . 9 7 


'l . 45 


3 9'. ? 7 


12. 


3 . 2 1 


- 1 . 9 0 


1.13 


5 8.- 2 4 


13. 


0.00 


0 . 9 9 


1 . 60 


47 . 'J 1 


** 

H. 


2.06 


1 . 00 


1.24 


45. 3 5 


1 5 . 


b . / 4 


y • 1 j 


A Q ■"> 


D ^ • H £ 


16. 


0. 5 3 


0 . 9 9 


0.9 3 




17. 


0.5 5 


0 . 9 9 


1.13 


41 . 25 


r8» 


1 .07 


• 0. 99 


1 . 34 


::4. 76 


19. 


2. 28 


1 . PT0 


1 . 38 


26. i 2 


20. 


3. '89 


0 . 9 7 


*0. 89 


22- r:3 


? 1 


&' 40 


0.9 6 


1 . 22 


34. 10 


22. 


2.2 9 


0. 99 


1.17 


17. u'3 




1 . 48 


0.9 7 


. 1.56 




2 4. 


- 3 . 43 


'T . 9 3 


1 . 59' 




r 


2.6 2 


0. "'.J 


i . 08 






0. 42 


u' , 9 9 


U 59 


t "T » ^ 






♦ 




/ 








* 








,156 







Table (con't) 



The 'Three Wei bull Parameters 'for Matrix filgebra Test' Items 



items 


V 


m . c . 


c 






27. 


0. 37 


0.9 8- 


1 . 64 


1 1 . 


52' 


28. V 


1 . 98 


0.9 3 


0. 80 


i* ■ 


12 


29: 


1 . 83 


10.98 ' 
0 . 9 '? 


• 1.12 


10. 


86 


30. 


1.85 


1,13 


13. 


9 7 


3 1 . 

32. 


2, 85 
4.61 


1 . 00 

fT00 


"0. 95 
1 . 10 


13. 
35. 


14 

26 


33. 


1 . 75 


0. 99 


0. 98 


44. 


33 


34. 


i? . m 


0 . 9 9 


2.10 


24. 


06 


35. 


2. 39 i 


0 . 9 9 


1.13 


29. 


62 


3,6. 


, - 1 . 83 


0. 97 


\ r . 28 


8. 


41 


37. 


2. 7 6 


1 . 00 


1.16 


1 1 . 


64 


38\ 


2.91 


0.9 9 


0.81 


8. 


30 


39* 1 


1.78 


1 . C>0 ' ' 


1 . 3 t 1 


6 . 


59 


40. , 


1.80 


0. 99 


1.11 


6. 


6 3 


41 . 


4.8t5' ' 


. 0. 99 


0.95 • 


28. 


31 


42. ' 


4. 76 , 


1 . 00 


0. 74 


10. 


6.6 


43. 


3.97 


0.99 , 


0. 95 


7 . 


00 


44. 


2. 97 


0.9 7 


- 0. 94 


7 . 


19 


45. 


1 . 94 


0. 9 4 


1 .03 . 


9. 


64 


46. ' 


1 . 64 


0. 99 


i: 34 


16 . 


74 


47. 


5. 85 


0. 98 


1 .06 


14. 


'48 


48. 


1 . 49' 


0. 99 


■ 1 . 20 


17 . 


99 



*Pret est, given after 76 Fall semester, NO subgroup- 



157 



• Table E5 

Kolmosroi-ov-Smirnov Tests for- Matrix Algebra Test Items for OK subgroup 



item p z N item p z N 



1) 0.0000 2.7994 90 


25) 0.0481 1.3651 62 


2) 0.000% 1.98,41 96 


26) 0.0000 4.79£3- 81 


3) * B. 1823 1 .0942 78 


27) 0.0053 1.7213 49 


4) 0.9139 0.5586 79 


28) 0.0004 2.0688 36 


5) 0.1076 1.2088 94 


29) 0.8?79 0.5730 26 


6) 0.0042 1.7557 96 


30) 0.0095 1.6362 24 


7) 0.1012 1.22*13 61 


31) 0.2322 1.0373 31 


8) 0.6397 0.7426 54 


32) 0.6328 0.7467 33 


9) 0.6809 0.7181 70 


33) 1.0000 0.2884 29 


10) 0.8039 0.6423 65 


34) 0.0900 1.2452 27 


11) 0.3451 0.9359 34 


35) 0.2346 1.0348 57 


12) 0.2275 1.0422 30 


36) 0.0062 -1.6985 37 


ia) 0.9137 0.5588 43 


37) 0.8640 0.5964 9 


14) 0.6949 0.7098 42 


38) 0.5605 0.7900 23 


15) 0. 2306 1 .0389 49 


* 39) 0.7606 0.6699 16 


16) 0.9160-0.5566 60 


40) 0.0329^1.4332 28 


17) 0.7804 0.6575 21 - 


. 4f) .0:8993 0.5:718 21 


18) 0.96.89 0.4920 61 


42) 0.0178 1.5363, 20 


19) 0.5555 0.7931 "26 


43) 0.7607 0.6698 18 


20) 0.5510 0.7958 46 


44) 0.9579 0.5090- 16 


21) 0.8376 0.6194 42 


45) 0.8395 0.6181 12 


22) 0.0037 1.7754 77 


46) 0.9985 0.3845 16 


23) 0.0745 1.2826 16 


47) 0.3650 0.9205 40 


24) 0.9468 0.5236 27 


48) 0.7982 0.6460 32 



Pretest for all subjects after 1976 Fal 1 < semester : ' gcodnes 
of fit' testing for Gamma distributions 



4 



1U7 



ERIC <• ' 158 



Table E6 

Kolmogorov-Smimov Testa for Matrix Algebra Test Items for NO -subgroup 



* w w ^ ^ 1 1 

l" 1 0.9996 0 3540 Iff 


item p z U 
0. 0805 1.Z675 38 


0.9623 0 5026 4 


£o)\ i5.ol51 0. 6348 19 


3 J 0.1775 1 . r003 22 


Jd. V0o9 5 1 


4) 0.0748 1.28T7 21 


?ft1 R crcrcr £r o o i i o ^ 
to; jo . bjojdv 4 # 7 1 1 o o4 


5) 0.9592 0.5071 5 


2Q1 R RR?fi* 1 fti i i 7^ 

7 ^ JD«JEjJD£0'1«0111 / "t 


6) . 0.3186 0.9573 4 


301 0 R73R 1 4fifi? 76 

-*J *J } JD • C t. J U 1 • lOO J /O 


7) 0.7544 0.6737 38 


.311 0 IQQflf 1 R7?Q A Q 

» • 1 7 7JD 1 • JD / 0 7 0 7 


8) 0.9885 0.4463 46 


321 0 Q0Q0 R 5 6? 1 67 

O • 7JD 71) JD « 7 O J 1 0/ 


9) 0.7421 0.6813 30 


331 * R RRR1 ? ?R?? 71 


10) 0. 9417 0. 5298 3-4 


3 41 R PQA^ R 70 


11) 0.0010 1.945 6 6-6 


cr ceo? a ~?qj±q a\*s 
oz>) v.z>?cf ti./y4o 4J 


12) 0.8839 0 5847 '70 


oo; jy . JDJ0 11 1.9415 63 


131 R R74Q R c;qi P c;7 


37) 0.3386 0.9411 90 


14) 0.9500 0.5196 58 


38) 0.0000 2.5784 76 


15) 0.0000 2/68 2 3 51 


39) ' 0.2728 0.9974 03 


16) ' 0.9646 0.4990 40 


40) 0.0279 1.4616 71 


17) 0.8622 0.6016 79 v * 


41) 0.-2250 1.0449 78 


18) 0.931'6 0.5410 39 / 


42) 0.0026 1.8225 79 


' 19) • 0.7355 0.6853. 74 


4t 43r 0.0003. 2.1121 81 


20) r 0.0027 1.8167 54 " 


44) 0.-0000 2. 9649 83 


21) 0.0966 1.2309' 58 


'« 45) 0.0000 3.6842 87 


2$) 0.912*4 0.5600 23 


46) 0.0683 1.2993 83 


23) 0.0645 1.3103 84 


47) 0.0075 1.6713 58 


24) 0.0206 1.5126 73 


48) 0.2291 1.0405 66 



cretsst after Fall 76 : fitting Gamma 



148 
159 



Table B7 

Kolmogorov-Smirnov Tests for Matrix Algebra Pretest Items ^Matched Sample 



NC' Group OK Gr6up 



item p z N 
1) 1.0000 0.3196 5 


item p 2 N 
1) 0.6927 0.7111 . 51 


2) 0 . 2 700 1 . 0000 1 


2) 0.2707 0.9994 55 


3) 0. 1690 1.1114 13 


3) 0.8910 0.5789 43 


4) 0.9456 0.5251 12 


4) 0.9976 0.3962 44 


c ^ or o n o o or r t t o 

5J 0.89Z8 0.5774 3 


5) 0.7999 0.6449 53 


a ft ft n £ or oco£ o 
oj 15.9996 B. 3536 Z 


6J 0. 1503 1 . 1375 54 


t \ nr ft ft o o a a ~> a o or 
7J 0. 99Zo 0.4Z90 30 


'7) 0.8781 0.5894 26 


o ^ nr ftftoo o ii o o or 

oj 0.9922 0.4320 31 


8) 0;7253 0.69F6 25 


Mi \ MT S> M 4 4 t~X F~ ^9 V* M ^\ 4 A 

9) 0.8941 0.5762 21 i 


* 9) 0. 8755 0.5914\ 35 


10) B. 9250 0. 5479 24 


10) 0.8941 0.5762 \ 32 


4 4 \ Mf y» A ^ A Mr ^ 4 SVM .4 

11) 0.6932 0.7108 34 


1 1) 0. 91 13 0. 561 1 ! 22 


12) 0.9954 0.4145 40 


12K *&. 7020 0.7055 - 16 


13) 0.9375 0.534,5 35 


13) < 0.9185 0.5542 21 


14) 0.7875 0.6529 34 


14) ' 0.9518 0.5173 22 


15} — 0.7963 r. 6472 35 . ; 


15) 0.7017 0.7057 21 


,16) 0.9873 0.4503* 29 


16) '0. 9962 0. 4089 27 


17) 0.9303 0.5424 49 


17) 0.9171 0.5556 7 


18) 0.9100 0.5623 26 


18) 0.9941 0.4225 '30 


19) 0.6220 0.7531 25 


19) ^ 0.4319 0.8725 31 


20) 0.9818 0.4656 11 


20) 0.0389 1.4036 45 


21) 0.2851 0.9861 31 


21) " 0.5322 0.8074 25 


22) 0. 1638 1. 1184 | 44 


22) 0.5341 018062 12 


23) 0. 1910 1.0834 I 45 


23) 0.9999 0.3228 11 



Pretest for matched group after 1976 Fall semester; 'goodness 
of fit' testing for Weibul 1 distributions 

< - \ 

160' 



it* % 

\ 

« 

e 


* 




« 

Table £8 




f 

\ * 

•4 

r 




\ 

, v The 


three We i bu 1 1 Parameters f or 


< 

Mat r i ft l^ebra Test. . 1 1 ems^ 






items 




m . c . 


c 






* 


1 . 


, 9.87 


0.9 7. 


> 1-. 44 


33^67 






2\ 


2 . 93 


0.99 . 


2. 42 


24. 10 






• 3> 


5. 33 


0. 9*9 


1.39 ' 


20.21 






4. 


6 . 30' 


1 . 00 


1 . 65 


23. 73' 






5. 


8'. 77 


0: 9 9 


1.17 


*1 3 . 13 






6 . 


C O 1 


7k n o 

A3 . V ? 


1 . 5 4 


11 . 99 








y . 0 6 


l . jyjy 


1.11 


3 3.17 « 




* 


8. 


0.JCJ0 


0f. 98 


' 1 .-45 '* 


60. ft 8 




• 


9. 


9 . 85 


1 .#0 


1 . 55 


32.95 






10. 


'8.14 


* 

0. 99 


• i .6? •• ; 


22. 43 






11. • 


4. 83 


.0. 98 


• 1.-12 


3,7. ?2 j 






1 ?. . 


8.89 ' 
5. 42 


0."98 


0 99 


36 55 






13. 


' J@L- 9 9 


1 107 


52.52-" 






14. 


10.29 , 


*1 . 00 


•0. 95 


36.24' 






15. 


24, 94 


0. 9 8> 


0. 99 


95. 10. 






16. * 


5.62 


1 . 00 


1 .,0 b 








17. 


0.00 


0.97 


r.25L 


131.76 . : 






Z4. 


3.48 


0. 93 


1 . 33 . 


12.^32^ " * 






5 . 


3 . 32 


0. 9y , - 


' 1.'54 








26. 


4 . 9 4 


0.9 6 

< 




6.74 






27. 


3. 94 




0.64 


4 9.89 . 






28. V 


3 .99 


*. 0. 98 


' 0 : 50 


7 1> 






29. 


7 . 79 


"' 1 . 00 : 


0. 64 


*» 24. 64 . 






Post test for 


matched group, Multpost 
<s - 


for. ''OK' subgroup . 














** * 














s - 






» 




150 














161 








ERIC 












-J 



Table E9 



The Three Wei bull Parameters for Matrix Rlgebra Test Items 



items 
1 . 
3. 
4. 
5. 
6 . 

i 

8. 

9 . 
10. 
1 1 . 
12. 
13. ' 
14. 
15. 
16. 
17. 

24. 

25 . 

26. 

27 . 
23. 

29. 



L 0 
21 .04 

7. 45 

5. 18 

0 . 00 



0.00 
5.69 
6.01 
3.67 
1.91 
IB. 29 
4. 23 
9.09 
.3.41 
3". 46 
3.61 
4.73 

3.27 
2.6 9 
1 ..12 
2. 74 
2.92 
2.76 



130 

9 3 
9 9 

J0.0T 

00 

m 

99 
0. 99 
0 . 9 9 
0. 99 
1 . 00 
0. 98 
0. 99 
0. 99 
1 .00 
0. 99 

0. 93 
0 . 9 8 
1 .00 
0 . 9 9 
0. 99 
0. 97 



c 
1.12 
0. 90 
2.54 
3.97 
* * * * 

1:54 
1.15 
0.82 
1.31 
1 .,13 
1 . 37 



1 . 
1 . 
1 . 
0. 
0. 



23 
21 
13 
84 
9 8 



1 .56 
1.11 
1.34 
1.0 7 

0. 65 
1 .00 



29. 62 
21 . 42 
13-. 5 4 
21 . 49 



13 
32 
36 
27 
22 
30 
45 
32 
47 
54 
47 
32 



79 
93 
9 8 
40 
44 
31 
23 
31 
50 
65 
39 
02 



.9. 46 
10. 84 
1 i . 8 2 

6. 79 

5.16 
8.03 



r 



•Mult post for NO subgroup, matched sample 



9 

ERIC 



r 

162 



, Table B!10 - 
Kolraogorov-Smimov' Tests for Matrix Algebra Posttest Items } Matched Sample 



NO Group 



OK Group 



item p z N 

1) 0.0000 0.0000 0 


item p z N 
1) 0.6305 0.7480 56 


2) 0.0000 0.0000 0 


2) 0.7303 0.6885 56 


3) . 0.8323 0.6231 . 7 


3) 0.9344L 0.5380 yA9 


4) 0.0000 0.0000 0 


4) 0.5666 0.78W 56 


5) 0.9996 0.3536 2 


5) 0.6345 0.7456 54 


6) 0.0000 0.0000 0 


6) 0. 1768 1. 1012 56 


7) 0.0000 0.0000 0 


7) 0.8277 0.6263 55 


8) 0.0000 0.0000 0 


8) 0.7035 0.7046 55 


9) 0.9981 0.3899 4 


9) 0.9364 0.5357 52 


10) 0.0000 0.0000 0 


10) 0.1114 1.2016 56 


U) 0.7364 0.6848 36 


11) 0.9354 .0.5368 20 


12) 0.9577 0.5093 27 


12) 0.9788 0.4727 £9 


13) 0.9446 0.5263 16 


13) 0.7324 0.6*872 40 


14) 0.7286 0.6895 12 


14) 0.7929 0.64J94 44 . 


15) : 0.9455 0.5252 7 


I5X 4 0.8850 0.5838 — 49 


16) 0.9240 0.5489 4 


16) 04*2204 1.0498 52 


17) 0.5312 0.8080 30 


17) 0.6169 0.7561 26 


18) 0.9267 0f5461 21 


18) 0.7893 0.6518 35 


19) 0.9826 0.4636 4 


19) 0.3674 0.9188 52 


20) 0.9984 0.3850 5 


20) 0. 1832 1.0930 ' 51 


21) 0.9295 0.5432 7 


21) . 0.5858 0.7747 49 


22) 0.9887 0,4457 21 


22) 0.8147 0.6351 35 


23) 0.7709 0.6635 36 


23) 0.9030 0.5686 20 



Posttest for matched group after 1976 Fall semester; 'goodness 
.of fit' testing for Weibull distributions 



11C 



152 

163 



. Table Ell 



The Three Wei bull Parameters for Matrix ft^ebra Test Items 



items 


*9 


rn . 0 . 


c 




i . 


11.57 


I"! . 9 R 


1 . 04 


.24* 29 




U . 1 J 


1 . J 


1 . 73 


1 3 . 43 




(: . 47 . 




1.29 


1 7 . a 1 


4-. 


i . 1 3 


PT . 9 9 


i . 32 


17. > 5 


q 

Li . 


* 6 . 7 5 
l > . 6 6 


ft . 0 'J . 

> 


i . 1 0 y 
1 . 36 


1 . 8 7 
, 11.76 




U . -7 Rf 


tf. 9 " 


0. 90 


12 . 5 1 


C; 


1 c > . 29 




1.11 ~ 


21. 72. 


Q t 


/ . 3 


• S. 


1 . 47 


- 29. 19 


is. 


<3*. 7 3 


0 . 'J 6 


1 .01 


. 16.91 


11. 


1 1 . 34 


. 0 . 99 


1 . 23 


46y88 


i?. 


.1.12 


* 0 . 9 0 


1.15 * 




13. ' / 161. 49 


0 . «i !3 


0.95- , 


54. 62 


14. f 


9 3 2 


0 . 9 9 


1 . 65 


53. 94 


IS. 


19.0$ 


0 . 9 9 


i . 20 


95. 00 


16. 


8. 59 


0 . 9 3 


1,41 ' 


• 7 7.43 


17. 


26. 91 


0 . 9*8 


0. 87 


111.12 


18. 


5. '72 


1.08 , 


• 0. 93 


2 6. s 24 


19. 




0. 99 


1 . 40 


• 3 .'63 


2B. 


0 .97 - 


J3 . 30 


3 . 08 




2. 80 


0.99 


0. 69 


16 . L "0 


22. 


3. 76 


0.99 


0.74 4 


[7 '.3 4 


2 3 . 


5. 30 . 


0. 9t! 


1 . 1<? 


27.01 


Postt 


est for 


matched '.ample 




of OK <?. 



153 
164 



Table E12 



The Three Ueibull Parameters for Matrix Alsrebra Test Items 





JO 


m . c . 1 


c - , 




3 . 


1 . 46 


fl. 97 


1 . 75 


f6.02 


? . 


0 . m 


1 . Pf0 


4 . 4 1 


18.57 


9 . 


12.16 :• 


0 . '9 9 


0.. 76 " 


.26,18 


1 1 . 




1 . @' lT i 


0 . 9 9 


34. 18 


12. 


12W3 


W . 9 9 


1 . 22 


85 . 7 1 


13. 


28. 64 


' a) . 9 u 


1.19 


-1 ,? . 7 4 


14. 


1*. 5l' 


3 . 9 9 


1 . 43 


38.64 


11. 


y . 00 


0 . 9 6 


0 . 7 5 


1 40 . l - 5 


16. 


0 . 00 


» - 0.96 


1 .04 


1 49 . 1 6 


17. 


0 . 00 


.? 0.9:3 


,1.31 


9*1 . 28 


lb. 


5 . 63 


. 9 9 


B.75 


• 20. 1 1 


19. 


2.93 : 


0 . 9 8* 


'0. 88 


10. 31 


2 0 . - ' 


2.54 


0.9 9 


1 . 36 


14.1 0 


2 1 . 


.1. 00 


0 . 9 7 


0\ 53 


14.51 


2 2. 


3 . 70 


J . 00 


0.*76 


-6744 


23 . 


4. tf5 


0.9 9 


0.79 ,\ 


* 14.52 


Multpo 


st' for 


NO subgroup 


in matched 


sample 



16*5 



, Table E13 , 

Kolmogorov-SmirnoV Tests for Matrix Algebra Posttest Items j Matched Sample 



NO subgroup 



OK subgroup 



item p ) 2 N 
1 ) . 0 . 0000 0 . 0000 0 


item p z n 

1J 0.0000 50 


2) B.BBBB B.BBBB B ■ y 


' £j 0 • OO/D 0 • / £D X ->o 


3) 0.9192 0.5535 — 7 




4 J 0 • 0000 B • 00015 0 


41 AT 46 Iff ff 85"28 56 ' 


or Q70Q or 1707 0 


51 if 08 1 7 1 2646 54 


£\ or aaaa a aaaff or 
Dj. 0.0000 0.0000 0 


61 ff 1957 1 0779 56 


-» % or aaaa a aaaa a 
7 J -0.0000 0.0000 » 


71 0 0231 1/4935 55 


0^ or aaaa a aaaa a 
oj 0.0000 0.0000 0 


ftl 0 1042 1 2153 '55 


/% \ ACQS* vt 

9) 0.9847 \0. 4580 4 


at ff 7 «?•>(! flf 52 

0 • I JJX) B m U f -J -J J t- 


10X- 0.0000 0.0000 0 


1 1 or'* a aaaa ^ 1717 c£ 


It) 0.3983.0.8960 36 


11) 0.7752 0.6607 20 


12) 0.9298 0.5429 27 


12) 0.9940 0.4229 29 


13) 0.8413 0.6168 16 


13) "g. 215^9 1.0547 40 


14) 0.6815 0.7177 12 


14) 0.9332 0.5392 44 


15) 0.8296 0.6250 7 1 . • 


* 15) • 0.20*27 1.0697 49 


16) 0.8272 0.6266 4 . 


16) * 0. 1619 1. 1211 52 


17) 0.7366 0.6847 30 


' 17) 0.1102 1.203£ 26 


18) 0.2239 1.0460 21 


18) 0.9032 0.5664 ,35 


19) 0.9307 0.5419 4 


19), 0.1693 1. 1110 52 


20) . 0.9971 0.4004 5 


2*0) 0.0000 3.6924 51 


21) 0.1835 1.0927 7 


21) 0.1411 1.1513 49 


22) 0. 1523 1 .4*345 21 


22) 07^72^.^1007 35 *- 


23) 0.0133 1.5837 .36 


23) 0.9472 0\5231 20 



Psottest for matched' groupj, after 1976 Fall semester; 'goodness 
of fit testing for Gamma distributions 



* 



0 

ERIC 



5 



155 
166 



r 



TABLE El A 



Kolmogorov-Smirnov Tests for Matrix Algebra Posttest Items 
' NO Group 



OK Group 



item 

1) 

2) 

3) 

A) 

5) 

6). 
■7) 
v8) 

9) 
10) 

Fi) 

12) '. 

13) 

14) 

r 

17) 

18) ( 
19) 
20) 
21) 
22) 
23) 
24) 



P z 

0.3300 0.9480 
0.2643" ^Ol.0051 
0.9378 



0.1693 
0.*3757 
0.0751 
o!6907 
0.3863 
0.6921 
0.0237 

"0*. 8667 
0.9929 

=-0.8903 
0.7517 
0.9255 
6.2001 
0,4410 
0.8377 
0.3289 
0. 1469 
0.5989 
0.8014 
0.903O 
0.0000 



0,5342. 
1.1109 
,0.9125 
1.2810 
0.7123 
0.9047 
0.7115 
1.4891 
0.5982 
0.4285 
0.5-794 
0.6754 
0.5473 
1 .0726 
0.8662 
0^6194^ 
0. 



5489 • 
1L1425, 
0.7668 , 
0.6439* 
0.5686 ' 
5.1962 

f 

Porstfcest for all* 
of fit 1 testing, for Weibull 



ERIC 



subjects after 
distributions 



156 

187 



N 

0 
0 

J 9 

' 0 
2 
0 
3 
2 

• 7 
0 
41 
33 
17 
14 
8 
5 
34 
28 
5 
5 

11 

26 
44 



1976 Fall semester; 'goodness 



N 


item 


P 


z 


64 


1) 






64 


2) 






55 


3) 


'.4834 


.8382 


63* 


4) . 






62 ' 


5) 


.9996 


.3536 


64 


6) ■' 






61 


7) 


.0050 


1.7321 




8) 


.0366 


1.4142 


57- 


9) 


.6504 


.7362 


64 


10) 






& 


11) 


' .7463 


.6787 


31 


12) 


.9350 


.5373 


50 


I 3 ) 

14) ' 


- .9093 - 
' .7109 . 


»5628 

A 

. 7002 


56 


15) 


-.6895 ' 


* .7130 


59 


16) 


• .6686 • 


. 7255 




17) 


.'7767- 


'.6598 


36 


18), 


.5818' 


• .7771 


59 
59 


.19)' 
" 20) 


.6408 
.9984 


.7419 
."3850 


.53 


' 21)- 


.7015 


* .7058 


38" 


22) 


.9605 


.5052 




• 23) 


.3914 


.9010 


27' 









t 




Ifcble E15 



The Three Wei bull Parameters for Matrix Algebra Test Items 



1 terns 




m. 


o • t 










1 . 


11,59 




Q 7 


V 


. 06 


\ Z5. 


■■> r 






0. 


9 7 




. 76 


■ \ 3«3, 


D 1 


■_» . 


0 . ? 4' * 


1 . 

r 


H.i-f 




. 2 4 


i 7 . 


1-6 


4. 


Li "? 

, E '.» tf«, 


u . 


V ^ 




. 35 


20. 


ft? 


?. 


L" • '' '0 


9 . 


9 


..33' * 


1 


O 
♦ • U 


6 « 


c . 6 7 






i 


. 41 


1 1 . 








0* . 


; 


$ 


n 


1 3 . 


r-t i 


» 


1 L . . 46 


w , 


9 


i 


, .35 


3 3 . 


J 6 




? , 3 3 


0f, 




i 


. 46 


33. 




i fJ . 


3.78 




0 i. : t 


i.j 


■ 97 , 


18. 


2L 


> l . 


V . 61 




«:i ' i 


.i 




Ar 

\ 


/.8 


12. 


.p . 2 8 


nr. 


n 


i 


. 20 


117. 


V5 


1 3 . «w 


13. 11 I'-? 

c 




9 j j 




. S-5 


" 56. 


•f9 






q , 


y 9 « 


i 


.51 




.', i 


1 r, 


l\. 23 , 


B ■ 


3 '3 


1 


. 21 




14 




SV?1 


a. 


y3 


1 


. 43 


/ 7 . 


fc> 'J 




3. B3 


0 . 


9 7 


' 1 


. 18 


'I- 1 j. 


..8 


18. 


?. 74 


l . 


Jjfl 




. 9 3 


' • 25. 


-17 




2.76 


@f . 


9 9 


1 




-. 9 . 


Jt-0 




2.34 




9 7 


.9 


. 82 


9 . 


1 7 


■> 1 

u, X r 




^. Of . 


3 8 


M 


. 31 




■ .3 




<».-, ,-. •-, 

:■ . o «. 




y 9 


r*. 


.71 


1 7. 


!j 5 




5 . 3 ff 

« 


0\ 


9 -J 


1 


, I 0T 


?,/ . 


, • i 




OK 








I U 


b j ect 3 





0 



9 

ERIC 



157 

168 



V 



/ 



Table El6 



The Three Wei bull Parameters for Matrix Fllge^ra Te5t Items 



items t. ? 


rn . c . 




c 




1 . 


1 ■ >'l All 


>J ■ 


0 


. CT0 


0/0 


•>-.■) ■ 


1 iTf iVf 
i ■ J J t'l 


0 ■ 00 


0 


.00 


0/ 0 




1 . 46 


0.97 


1 


. 75 


l & . y ^ 




* .0 . 0.® 




4 


.4 1 _ 


J8.>7 


c< 


12.16 




0 . 7 6 


26.18 


i; . 


9 . 5 6 


* / 

r. ari-j 


0 


. 9 9 


3,4, i 3 


12. 


12.73, 


. 9' . 9 9 


1 


. 2 2 


0 5.71 


.i :>. 


2 8 . 6.4 


0 . 9 8 


1 


. 19 


42 . 74 


,14. 


1 B . 5 1 


0 . 9 9 


1 


. 43 


33 . * 4 


1?. 


0 . 00 


0.96 


0 


.'75' 


140.55 


16. 


. 0.0J0 


0.. 9 6 , 


" 1 


.84 


1 ^9 . 1 6 


17. 


' .0.00 


> 0 . 'J 3 


. 1 


.31 


91.23' 


1 B . 


5.63 


0 - 9 9 


0 


. 7 5 


29.11. 


1.9. 


2.9 3 


0 . 9 3 


,-v 

,i 


. is 3 


10.31 


20. 


. 2.54 


■7. 9 9 


1 


. 2 6 


• 14. 18 


21 . 


4 . 9 Pf 


* U /B J 7 


0 


. 53 


14.54 


"I ''I 


3.70' 




0 


. 7 6 


6*. 14 


23. 






' 0 


. 79 


14.:? 


r 


;u it post for 


NO subgroup 


in a 


11 S 


ubj 



9 

ERIC 



' 

169 



Table E17 



Kolmogorov-Smirnov Tests for Matrix Albebr^Wltpost Items for > All Subjects 
• NO Group OK Group 



item p , 2, N 
1) 0.99*92 0.3700 5 


item p z N 
1) 0.0267 1.4692 51 


2) 1 . 0000 0/0 1 


X) 0.7712 0.6633 55^ 


3) 0.2090 1.0624 13. 


3) ' 0.7644 0.6675^ 43 


T."*) 0V9459 0.5£4fr 12 


-4)- --0^-999-7 0.3494 v 44 


5)- 0.0000 5.3453 3 


5) > 0. 1201 1. 1859' 53 


6) 1.0000 0/B 2 


6) " 0. 2720 0. 9982 54 


7) 0.9980 0.3915 30 


7) ' 0.7921 0.6500 26 ! 


8) 0.6979 0.7080 31 


8) 0.8316 0.6236 25 


9) 0.95'40 0.5143 21 


9) * 0.9214 0.5514 35 


10) 0.9386 0.53-33 24 


10) 0.8644 0.5999 32 


111 0. 1970 1 .0762 34 


LI) • 0.2368. 1.0325 22 


12) 0.9880 0.4479 40 


12) 0.4371 0.8689 16 


13) '0.4483 0.8613 35 


13) , 0.8973 0.5,735 21 


14) . 0.8602 0.6030 34 


14) 0.8358 0.^206- 22 , 


15) 0.6073 0.7618 35 


15) 0.6166 0.7563 21 | 


16) ' 0.8174 0,6333 29 


16) 0.9800 0.4699 27 


17) 0.2751 0.9952 x 4? 


17) 0.9096 0.562.6 7 . 


18) 0.9833 0.46 19 2>6 


18) 0.9970 0.4021 30 j 


, 19) 0.3216 0.9548 25- . 


19) 0:5501 0.7964 31 j 


20) 0.9676 0.4941 ' 1 1 


'20) 0.0000 3.5313. 45 . 


'21} 0. 1521 1. 1349, 31- 


21) 0.0018 1.8709 25 . 


. * 22) 0.0000 2.4902 44 ' 


22) 0.0138 .1./5772 12 


"23) 0.0007 1.9862 45 


23) 0.9198 0.552"!* 11 


—? ' h U 



Multpost for all subjects after 76 Fall semester j goodness of fit test for 

Gamma distributions 



ERIC 



159 

.170 



Note * The following programs are used in connection with the , programs 
listed in this table: "wb2," "wb2area," "kappa," "llab," "kolmo," "gamma," 
"wgraf ," and "kgraf" programmed by Robert Bailtie; "stated! t," programmed 
by J. Michael Felty. 




TABLE E18 

Kolmogorov-Smirnov Tests for Matrix Algebra Test Items 



OK Subgroup for Matinvtest 



Weibull 








Gamma 






item 


P 


z 


N* 


item 


p 


z 


1 ) 


. ->UU 


o n *7 

. o27 


27 


1) 


.002 


1.88' 


i \ 


Too 

. / A J 




30 


2) 


.031 


1.45 


j) 




c en 
. ->5U 


28 


3) 


.980 


. 470 


*• J 


Ann 




1 "7 


4) 


. 376 


.912 


5) 


.600 


.768 


27 


5) 




1 ■ OJO 


6) 


.440 


.867 


29' 


6) 


• .017 


1.. 546 


jy ... . 


-.824 


""."629 


23 


*n ■ 


.915 


.558 


8) 


.710 


.702 


27 


8) 


.214 


h057 


9) 


. 737 


.685 . 


' 28 


9) 


.619 


.755 


10) 


.£92 


.980 , 


27 


10) 


.026 


1.471 


ID 


.958 


.510 


20 


11) 


. 722 


.694 


12) 


.600 


.766 


24 


12) 


.006 


1.710 


* The total N is 


30 and No 


subgroup 


was not 


analyzed 





Weibull parameters 



/ 



item 


t' 
o 


c 


,J o 


i) 


18.66 


-85 


41. 50 


2) 


8.78 


.98 


14.80 


3) 


2.88 


2.29 


24.35 


*) 


7.51 


1:42 


27(35 


5) 


• 7.94 


0.74 


14.68. 


6). 


7.87 


0.91 


12.50 


7) 


1.61 


. 1.84 


20.79 


8) 


13.50 


. .93 


' 36:98 


• 9) 


6.25 


1.41 


32.21 


10) 


5.74 


.93 


21. 23 


ID 


15.37 


' 1.03 


' 21.69 


12) 


4.92 


.63 


41.25 



160 

172 



TABLE E19 



Kolmogorov-Smirnov Tests for Matrix Algebra Test Ite;ns 
OK Subgroup for Transtest 
Welbull Gamma 



item 


P 


z 


N* 


item 


P 


z 


i) . 


.464 • 


.851 


16 


1) 


.579 


.778 


2) • 


.694 


.710 


26 


2) , • 


'.073 


V 285 


3) 


.146 


1. 144 


34 


3) 


nil 
. u 1 1 


1 AIR 


4) 


.941 


.530 


11 ' • 


A) 


.874 


. 592 


5) 


.950 


.519 


26 


5) 


.896 


.574 


6) 


.061 


1.321 


29* 


6) 


.062 


1. 316 


7) 


! 216 


1.054 


34 


7) 


.000 


2. 304 


* The 


total N is 


38 














OK Subgroup for 


b 

Eigtest 


•• 




Weibull ' 






Gamma 






! 

item 


P 


z 


N* 


item 


P 


z 


if 


.971 


.488 


20 


1) 


.513 


.819" 


2) 


■ .626 


.751 


39 


2) j 
3/ 


.801 


!644 


3) . 


.829 


: 625 


44 


.825 


.628 


~4> 


.918 " 


.555 


28 


4) 


.388 


.904 


5) 


.666 


.727 


' 34 


5) 


. J67 


.919 


6) 


.999 


.285 


16 


6) 


.978 


.475 


7) 


.609 


.761 


25 


• 7) . 


.701, 


.706* 


8) 


.727 

•i 


.691 


30 


8) 


.640 . 


.742 


* The 


total N is 


56 
















161 

173 




1 





TABLE E2CT 

\ 

Kolmogorov-Smirnov Tests for /Matrix Algebra Test Items 
* NO subgroup for Trans test 
Weibull ' Gamma 



item 


P 




z 


N* 


item 


P 


z 


1) 


.965 




.498' 


22 


1) 


.'718 


.696 


2) 


.872 




.594 " 


12 


2) 


.544 


.800 


.3) 


.ST97 




.399 


4 


3) 


.945 


.526 


4) 


.038 




1 . 40 7 


27 


4) 


. 123 


1 1 O O 


5) 


.829 




62S 
• \j i- -> 


12 


j ) 




. 865 
• 


6) . 


.732 




688 


Q 


61 


1 16 


1 1 93 


7) 


.964 




son 


A 

H 


• ) 


88S 


S8& 


* The 


total N 


is 


38 


« 






• 








No 


subgroup 


for Eigtest 




• 


Weibull 








Gamma 






item 


P 




z 


N* 


item 


p 


f 

z 


1) . 


.4665 




.8492 


36 


1) 


.477 


. .843 


2) 


.9801 




74697 


17* 


2) 


.838 


.619 


3) 


.9993 




.3649, 


12 


3) .-• 


.946 


.525 


4) ' 


.5266 




.8108 


28 


4) ' 


.587 


.774 


5) 


.7481 




-.'6776 


- 2? 


5) 


.832 


.623 


6) 


.9518 




.5173 


40 


e\ . 


.984 


-.461 


7) 


.7674 




.6656 


29 


1) 


.491 


.834 


8) 


.9323 




.5402 


24 


8) 


.959 


.507 


* The 


total N 


is 


56 






4 





162 

174 



TABLE E21 

Kolmogorov-Smirnov Tests for Matrix Atgebra Test Items 

Weibull Parameters for Trantest and Eigtest Items ^ 



OK subgroup for Transtest No subgroup for Trangtest 



item 


to 


c 




item 


to 


c 


Wo 


1) 


10.63 


U12 


31.65 


1) 


10.35 


.74 


95. 10 


2) 


15.45 ' 


.84 


67. 16 


2) 


4.09 


.80 


104.27 


3) 


3.80 


1.01 


13.99 


3) 


9.33 


'.84 


7.40 


4)» 


6. 15 


1.24 


22.53 


4) 


2.03 


l.f7 


33.58 


5) 


5.99 


1.23 


25. 11 


5) 


2.84 


1. 11 


20. 75 


6) 


2.86 


.94 


10.25 


6) 


5.94 


.57 


9.73 


7) 


2.93 


.91 


7.36 


7) 


4.36 


.94 


5.36 



» 



OK 


subgroup 


£or Eigtest 




NO 


subgroup 


for Eigtest' 


item 


C o - 


c 


^0 


item 


to 


— c 




i) 


10.68 


1.08 


75.94- ' 


1) ' 


3.29 - 


1.75 ' 


-91.00 


2) 


* 3.28 


1.63 


41.60 


2) 


15.09 


1.07 


20.50" 


3) 


6.88 


1.43 ■ 


27.26 


3) 


11.22 


1.06 


14.50 


4) 


5.43 r 


1.19 


17.42 


4) 


4.03 


1.44 


16.98 


5) 


7.69 


1.04 


17,54 


5) 


3.86 


-1.16 


25.81 


6) 


14.09 


,1.09 


22.01 


6) 


1.66 


1.39 


53.65 


7) 


6.37 ' 


1.27 


58.50 


7) 


8.03 


1.13 ' 


39.00 


8) 


8.26 


1.04 


35.83 


8) 


1 -57jfc- 


1.23 


34 . 51 



163 

175 




Appendix F 
Graphs of Conditional Response Rate 



7" 



iter.-- 
s it err, 

8.133 t 



J0.. 96 7 -- 



8.000 



3 : . 1.0*49 

4 : 1.022 



OK subgroup 



L 0. 
6. 5019 

?. 6480 



+ 



H 

19. 29 
14. 47 



10 



30 



2£ 

time in seconds 



, 40 



50 



Figure Fl Comparison of conditional response rates of 
items 3 and 4 for OK subgroup 



i terrLx 
item 

0.561 T 



0. 280 



0. 000 



0 



c 

2. 103 
0.8525 



OK subgroup 



5. 6420 
0 . 000 1 



+ 



H 

17.05 
10. 79 



10 



20 30 
time in seconds 



40 



50 



Figure F2 Comparison of conditional response rates of 
items 5 and 6 for OK subgroup 

• 176 



c 



item 
item 

0. 125 T 



4 

o 



1 . 240 

v 2. 188 



0.062 



NO subgroup 



L 0 
6.0210 

0.0001 



\ 



H 

23.34- 
32. 35 



10 



?0 



.30 



time in seconds 



40 



50 



Figure F3 Comparison of conditional response rates of 
items 3 and 4 for NO subgroup 



item 
item 



1 . 251 
1 . 353 



L 0 
7. 5 420 

5.5105 



H 
13. 70 

12.67 



0.237 T 



0*. 118 



0.000 



0 



NO subgroup 



10 



4- 



20. . 30 

t l me in seconds 



40 



50 



Figure F4 Comparison of conditional response rates of 
^ Items 5 and 6 for NO subgroup 



165 



177 



